PX4 / PX4-Autopilot

PX4 Autopilot Software
https://px4.io
BSD 3-Clause "New" or "Revised" License
8.21k stars 13.38k forks source link

[Bug] uxrce_dds_client on main disconnects #22558

Closed AlexKlimaj closed 7 months ago

AlexKlimaj commented 9 months ago

Describe the bug

It appears that there is a bug in main on the uxrce_dds_client where it will disconnect after only running for a few minutes. I'm not sure if this is related to the recent NUTTX DMA issues.

Once it disconnects from the flight controller side, restarting the agent or the client doesn't work. Only restarting the flight controller reconnects.

When running release/1.14, it appears to be working and not disconnecting.

To Reproduce

Run the uxrce_dds_client on main and it disconnects after a few minutes.

Expected behavior

uxrce_dds_client stays connected.

Status and top when working correctly.

nsh> uxrce_dds_client status
INFO  [uxrce_dds_client] Running, connected
INFO  [uxrce_dds_client] Using transport:     serial
INFO  [uxrce_dds_client] Payload tx:          50093 B/s
INFO  [uxrce_dds_client] Payload rx:          6931 B/s
nsh> uxrce_dds_client status
INFO  [uxrce_dds_client] Running, connected
INFO  [uxrce_dds_client] Using transport:     serial
INFO  [uxrce_dds_client] Payload tx:          50720 B/s
INFO  [uxrce_dds_client] Payload rx:          8580 B/s
nsh> top

 PID COMMAND                   CPU(ms) CPU(%)  USED/STACK PRIO(BASE) STATE FD
   0 Idle Task                  210993 45.140   272/  768   0 (  0)  READY  3
   1 hpwork                          0  0.000   292/ 1224 249 (249)  w:sem  3
   2 lpwork                          0  0.001   868/ 1576  50 ( 50)  w:sem  3
   3 nsh_main                        0  0.000  2164/ 3144 100 (100)  w:sem  4
   4 wq:manager                      0  0.000   636/ 1232 255 (255)  w:sem  5
   5 wq:lp_default                   5  0.205  1028/ 1896 205 (205)  w:sem  5
   6 Telnet daemon                   0  0.000   556/ 1984 100 (100)  w:sem  1
   7 netinit                         0  0.000   764/ 2024  49 ( 49)  w:sem  4
 872 mavlink_rcv_if0                 5  0.196  2292/ 6136 175 (175)  w:sem  5
  63 wq:hp_default                 101  3.955  1196/ 2368 237 (237)  w:sem  5
 237 dataman                         0  0.001  1068/ 1376  90 ( 90)  w:sem  5
 318 wq:I2C1                         1  0.052  1024/ 2312 246 (246)  w:sem  5
 329 wq:I2C2                         0  0.011   732/ 2312 245 (245)  w:sem  5
 331 wq:I2C3                         0  0.010  1024/ 2312 244 (244)  w:sem  5
 362 wq:SPI1                       129  5.082  1840/ 2368 253 (253)  w:sem  5
 376 wq:SPI2                       164  6.451  1840/ 2368 252 (252)  w:sem  5
 381 wq:SPI3                        64  2.510  1840/ 2368 251 (251)  w:sem  5
 396 wq:I2C4                         5  0.191   960/ 2312 243 (243)  w:sem  5
 494 wq:nav_and_controllers         80  3.154  1428/ 2216 242 (242)  w:sem  5
 495 wq:rate_ctrl                  142  5.591  2316/ 3120 255 (255)  w:sem  5
 496 wq:INS0                        96  3.731  3788/ 5976 241 (241)  w:sem  5
 497 wq:INS1                       102  4.005  3788/ 5976 240 (240)  w:sem  5
 498 wq:INS2                        97  3.824  3788/ 5976 239 (239)  w:sem  5
 500 commander                      15  0.592  1664/ 3192 140 (140)  w:sig  5
 761 gps                             0  0.028  1236/ 1936 205 (205)  w:sem  4
 860 mavlink_if0                    26  1.042  1932/ 3048 100 (100)  w:sig  5
 986 mavlink_rcv_if1                 0  0.000  2572/ 6136 175 (175)  w:sem  5
 983 mavlink_if1                     0  0.000  2116/ 3048 100 (100)  w:sem  5
1766 mavlink_shell                   0  0.000  1084/ 2000 100 (100)  w:sem  4
1051 uxrce_dds_client               99  3.919  9372/ 9872 100 (100)  w:sem 46
1077 wq:ttyS5                        7  0.281   860/ 1704 229 (229)  w:sem  5
1139 navigator                       2  0.094  1316/ 2104 105 (105)  w:sem 11
1752 logger                          4  0.175  3084/ 3616 230 (230)  w:sem  3
1759 wq:uavcan                      65  2.571  3140/ 3600 236 (236)  w:sem  5
1761 log_writer_file                 0  0.000   388/ 1144  60 ( 60)  w:sem  3
1764 mavlink_if2                   123  4.812  1996/ 3144 100 (100)  READY  7
1765 mavlink_rcv_if2                 5  0.208  2292/ 6136 175 (175)  w:sem  7
1767 netinit                         0  0.004   556/ 2024  49 ( 49)  w:sem  4
1850 top                            19  0.827  2036/ 4056 237 (237)  RUN    3

Processes: 39 total, 3 running, 36 sleeping
CPU usage: 53.53% tasks, 1.33% sched, 45.14% idle
DMA Memory: 5120 total, 1024 used 1536 peak
Uptime: 437.685s total, 210.994s idle
nsh> 
nsh> 

A few minutes later after this, it disconnected.

nsh> uxrce_dds_client status
INFO  [uxrce_dds_client] Running, disconnected
INFO  [uxrce_dds_client] Using transport:     serial
nsh> 
nsh> top

 PID COMMAND                   CPU(ms) CPU(%)  USED/STACK PRIO(BASE) STATE FD
   0 Idle Task                  351632 51.414   272/  768   0 (  0)  READY  3
   1 hpwork                          0  0.000   292/ 1224 249 (249)  w:sem  3
   2 lpwork                          0  0.000   868/ 1576  50 ( 50)  w:sem  3
   3 nsh_main                        0  0.000  2164/ 3144 100 (100)  w:sem  4
   4 wq:manager                      0  0.000   636/ 1232 255 (255)  w:sem  5
   5 wq:lp_default                   0  0.181  1028/ 1896 205 (205)  w:sem  5
   6 Telnet daemon                   0  0.000   556/ 1984 100 (100)  w:sem  1
   7 netinit                         0  0.000   764/ 2024  49 ( 49)  w:sem  4
 872 mavlink_rcv_if0                 0  0.189  2292/ 6136 175 (175)  w:sem  5
  63 wq:hp_default                   7  3.974  1196/ 2368 237 (237)  w:sem  5
 237 dataman                         0  0.000  1068/ 1376  90 ( 90)  w:sem  5
 318 wq:I2C1                         0  0.048  1024/ 2312 246 (246)  w:sem  5
 329 wq:I2C2                         0  0.000   732/ 2312 245 (245)  w:sem  5
 331 wq:I2C3                         0  0.000  1024/ 2312 244 (244)  w:sem  5
 362 wq:SPI1                        10  5.107  1840/ 2368 253 (253)  w:sem  5
 376 wq:SPI2                        12  6.418  1840/ 2368 252 (252)  w:sem  5
 381 wq:SPI3                         5  2.511  1840/ 2368 251 (251)  w:sem  5
 396 wq:I2C4                         0  0.176   960/ 2312 243 (243)  w:sem  5
 494 wq:nav_and_controllers          6  3.021  1428/ 2216 242 (242)  w:sem  5
 495 wq:rate_ctrl                   11  5.539  2316/ 3120 255 (255)  w:sem  5
 496 wq:INS0                         5  2.904  3788/ 5976 241 (241)  w:sem  5
 497 wq:INS1                         6  3.088  3788/ 5976 240 (240)  w:sem  5
 498 wq:INS2                         6  3.032  3788/ 5976 239 (239)  w:sem  5
 500 commander                       1  0.608  1664/ 3192 140 (140)  w:sig  5
 761 gps                             0  0.029  1236/ 1936 205 (205)  w:sem  4
 860 mavlink_if0                     2  1.046  1932/ 3048 100 (100)  w:sig  5
 986 mavlink_rcv_if1                 0  0.000  2572/ 6136 175 (175)  w:sem  5
 983 mavlink_if1                     0  0.000  2116/ 3048 100 (100)  w:sem  5
1766 mavlink_shell                   0  0.000  1084/ 2000 100 (100)  w:sem  4
1051 uxrce_dds_client                2  1.107  9372/ 9872 100 (100)  w:sem 67
1077 wq:ttyS5                        0  0.276   860/ 1704 229 (229)  w:sem  5
1139 navigator                       0  0.085  1316/ 2104 105 (105)  w:sem 11
1752 logger                          0  0.175  3084/ 3616 230 (230)  w:sem  3
1759 wq:uavcan                       5  2.829  3140/ 3600 236 (236)  w:sem  5
1761 log_writer_file                 0  0.000   388/ 1144  60 ( 60)  w:sem  3
1764 mavlink_if2                     9  4.703  1996/ 3144 100 (100)  READY  7
1765 mavlink_rcv_if2                 0  0.201  2292/ 6136 175 (175)  w:sem  7
1767 netinit                         0  0.000   556/ 2024  49 ( 49)  w:sem  4
1869 top                             0  0.000  2036/ 4056 237 (237)  RUN    3

Processes: 39 total, 3 running, 36 sleeping
CPU usage: 47.25% tasks, 1.34% sched, 51.41% idle
DMA Memory: 5120 total, 1024 used 1536 peak
Uptime: 724.650s total, 351.632s idle
nsh> 
nsh> uxrce_dds_client status
INFO  [uxrce_dds_client] Running, connected
INFO  [uxrce_dds_client] Using transport:     serial
INFO  [uxrce_dds_client] Payload tx:          30162 B/s
INFO  [uxrce_dds_client] Payload rx:          9855 B/s
nsh> uxrce_dds_client status
INFO  [uxrce_dds_client] Running, connected
INFO  [uxrce_dds_client] Using transport:     serial
INFO  [uxrce_dds_client] Payload tx:          30162 B/s
INFO  [uxrce_dds_client] Payload rx:          9855 B/s
nsh> uxrce_dds_client status
INFO  [uxrce_dds_client] Running, connected
INFO  [uxrce_dds_client] Using transport:     serial
INFO  [uxrce_dds_client] Payload tx:          30162 B/s
INFO  [uxrce_dds_client] Payload rx:          9855 B/s
nsh> uxrce_dds_client status
INFO  [uxrce_dds_client] Running, disconnected
INFO  [uxrce_dds_client] Using transport:     serial
nsh> uxrce_dds_client status
INFO  [uxrce_dds_client] Running, disconnected
INFO  [uxrce_dds_client] Using transport:     serial
nsh> uxrce_dds_client status
INFO  [uxrce_dds_client] Running, disconnected
INFO  [uxrce_dds_client] Using transport:     serial
nsh> uxrce_dds_client status
INFO  [uxrce_dds_client] Running, disconnected
INFO  [uxrce_dds_client] Using transport:     serial

Screenshot / Media

No response

Flight Log

NA

Software Version

main

Flight controller

ARKV6X

Vehicle type

None

How are the different components wired up (including port information)

No response

Additional context

No response

beniaminopozzan commented 9 months ago

Thanks @AlexKlimaj for spotting this. I cannot test it myself right now. Could you determine if this issue is restricted to px4_fmu-v6x targets with serial communication or if it also affects other targets and udp communication?

AlexKlimaj commented 9 months ago

It looks like I needed to revert these two commits and use @dagar DDS rework branch.

This PR https://github.com/PX4/PX4-Autopilot/pull/22534

With these reverted. @davids5 https://github.com/PX4/NuttX/commit/ed4814f6239097dde5eecf2b4fbd58661db84dda https://github.com/PX4/NuttX/commit/3dc3cf522758d70ca2a07c156e5e02517650327d

davids5 commented 9 months ago

The driver should be looked at for how it uses the serial port. Before opening in non-blocking mode would block. Now that the driver is fixed it does not.

AlexKlimaj commented 9 months ago

It does look like its opening it in non-blocking mode.

https://github.com/PX4/PX4-Autopilot/pull/22534/files#diff-c186a8a82bc9445fd93a8182d59a6d39855970f40612d9770855aea93df240eeR130

Mendeler commented 8 months ago

It may be that the baud rate is too low. Try increasing it.

AlexKlimaj commented 8 months ago

It may be that the baud rate is too low. Try increasing it.

Not the issue. Running at 3Mbps.