PX4 / PX4-Autopilot

PX4 Autopilot Software
https://px4.io
BSD 3-Clause "New" or "Revised" License
8.48k stars 13.51k forks source link

Mission download on slow telemetry can cause an excessive growth of mission requests #18082

Closed Hylke-Atmos closed 3 years ago

Hylke-Atmos commented 3 years ago

Describe the bug When a slow or somewhat "full" telemetry link is used. The mission upload (download from Px4 perspective), can cause an excessive growth of mission requests, making the problem even worse!

What happens:

  1. When an upload (e.g. from QGC) is triggered it wil send a mission count to PX4 to indicate how many mission elements there are.
  2. PX4 will start requesting element 1 (and starts a timeout trigger)
  3. QGC send element 1
  4. timeout occurred on receiving element 1 (timeout occurred) -> re-request element 1
  5. QGC sends element 1 again (let's call it element 1')
  6. element 1 is received by PX4, now request element 2.
  7. QGC sends element 2
  8. element 1' is received by PX4, "element 1 is not expected", therefor send request for element 2.
  9. QGC sends element 2'
  10. initial element 2 request timed out, re-request element 2.
  11. QGC sends element 2''
  12. Element 2 is received by PX4, now request element 3.
  13. QGC sends element 3
  14. element 2' is received by PX4, "element 1 is not expected", therefor send request for element 3.
  15. QGC sends element 3'
  16. et... etc...

Due to the significant increase of requests (and therefor answers of QGC) the telemetry link gets polluted and flooded with messages making the problem only worse!

To Reproduce This is difficult to reproduce if you don't have a telemetry modem that causes this level of delay. I guess (haven't tried it) it should also be possible to reproduce this by making the amount of used bandwidth higher than the available bandwidth (e.g. by increasing the send rate of some of the larger Mavlink Messages). This should also cause the timeout to be triggered quicker.

Expected behavior I would expect PX4 not to send requests when an unexpected element is received at all, or at least not for elements that have already been received! The timeout and re-request behavior (as designed per Mavlink) can cause multiple instances of the same element to be received, therefor the need for re-requesting an element whenever a received element is not equal to the one expected is IMHO wrong, since the a timeout will always re-request when it is not received in time...

Log Files and Screenshots Always provide a link to the flight log file: Unfortunately I haven't been able to create a decent logfile that contains either the debug or the Mavlink messages. I do have a screenshot showing an upload of a flightplan containing 100 waypoints. At the moment of taking it PX4 was requesting waypoint 93, but it had been running for about 20 minutes already and had been sending the mission_request message for over 22.000 time!!! At the end it was requesting a single WP a few thousand times each!!! Seriously high request count

Drone: Although unrelated: We are using VTOL drone and running PX4 on a Cube Orange.

What might be of interest: Radio: We use RFdesign rfd900x radios with a EU specific configuration (which is using a 10% duty cycle at 200mb/s over the air speed). We have Hardware flow control (forced) enabled.

dagar commented 3 years ago

Thanks for the detailed report.