ArduPilot / ardupilot

ArduPlane, ArduCopter, ArduRover, ArduSub source
http://ardupilot.org/
GNU General Public License v3.0
11.02k stars 17.57k forks source link

repetitive high latency in mavlink message receiving #14561

Closed dwiel closed 3 years ago

dwiel commented 4 years ago

Bug report

Issue details

@pietrodn wrote a simple script to measure the round trip mavlink latency using the TIMESYNC message available here. It sends a TIMESYNC message at 100hz and measures the time until the flight controller responds with its corresponding TIMESYNC. What I measure on my set up is very similar to the plots here. This script was originally written to measure mavlink latency which was being inflated by dronekit, a python library used to wrap the mavlink library. That original bug has since been removed, but the same pattern is now visible in the arducopter SITL. I have ruled out dronekit itself as the source of my problem by measuring latency on the same firmware, in one case on real hardware, and another case in SITL. The problem only appears in SITL.

Here's the pattern when running on SITL:

Sending 30 pings to vehicle...
TIMESYNC seq=0 time=9.208 ms
TIMESYNC seq=1 time=31.576 ms
TIMESYNC seq=2 time=21.540 ms
TIMESYNC seq=3 time=11.472 ms
TIMESYNC seq=4 time=1.421 ms
TIMESYNC seq=5 time=23.797 ms
TIMESYNC seq=6 time=13.839 ms
TIMESYNC seq=7 time=3.838 ms
TIMESYNC seq=8 time=26.773 ms
TIMESYNC seq=9 time=16.769 ms
TIMESYNC seq=10 time=6.740 ms
TIMESYNC seq=11 time=29.203 ms
TIMESYNC seq=12 time=19.228 ms
TIMESYNC seq=13 time=9.228 ms
TIMESYNC seq=14 time=32.327 ms
TIMESYNC seq=15 time=22.349 ms
TIMESYNC seq=16 time=12.334 ms
TIMESYNC seq=17 time=2.347 ms
TIMESYNC seq=18 time=24.900 ms
TIMESYNC seq=19 time=14.893 ms
TIMESYNC seq=20 time=4.880 ms
TIMESYNC seq=21 time=27.756 ms
TIMESYNC seq=22 time=17.749 ms
TIMESYNC seq=23 time=7.718 ms
TIMESYNC seq=24 time=30.578 ms
TIMESYNC seq=25 time=20.558 ms
TIMESYNC seq=26 time=10.499 ms
TIMESYNC seq=27 time=0.443 ms
TIMESYNC seq=28 time=22.913 ms
TIMESYNC seq=29 time=12.944 ms
round-trip min/avg/max/stddev = 0.222/8.164/16.164/4.725

a plot of a separate run

image

Notice how the round trip time is much larger than it should be (this is SITL running on a 24-core cpu and communicating over mavlink on the same host) and additionally, each round trip time decreases by almost exactly 10 ms every 10 ms. The original bug this script was written for was caused by time.sleep(0.05) and has since been replaced with select(). Perhaps there is an analogous sleep in the SITL UART/mavlink code somewhere?

Version 4.0.3

Platform [ ] All [ ] AntennaTracker [X] Copter [ ] Plane [ ] Rover [ ] Submarine

Airframe type hexa

Hardware type mRo x2.1

dwiel commented 4 years ago

actually, just realized that SITL was running from master and I was running 4.0.3 on the real hardware. I'll try to run a 4.0.3 SITL test later today.

dwiel commented 4 years ago

Just confirmed that this problem appears on SITL in version 4.0.3

rmackay9 commented 4 years ago

Thanks for the investigation. @peterbarker is more the expert than I am but I wonder if increasing the priority of the gcs().update_recieve and update_send() would help. That can be done quite easily by moving it up in the scheduler tasks list. In a testing environment at least moving it right to the top would be interesting.

https://github.com/ArduPilot/ardupilot/blob/master/ArduCopter/Copter.cpp#L147

dwiel commented 4 years ago

I already tried that. I also tried increasing the maximum us. No luck

dwiel commented 4 years ago

I've also tried disabling a large number of the components I am not using in APM_Config.h such as the sprayer, etc.

rmackay9 commented 4 years ago

@dwiel, ok, the scheduler tasks list actually runs after the fast_loop, so the next test I would recommend is removing those two lines from the task list and then putting the equivalent two calls at the top of the fast_loop. https://github.com/ArduPilot/ardupilot/blob/master/ArduCopter/Copter.cpp#L222

I'm not sure exactly what the two lines will look like but probably something like:

gcs().update_receive() gcs().update_send()

peterbarker commented 4 years ago

On Tue, 9 Jun 2020, Zach Dwiel wrote:

Just confirmed that this problem appears on SITL in version 4.0.3

So SITL's clock isn't wall-clock.

You might try with a linux binary instead.

dwiel commented 4 years ago

I was wondering how SITL dealt with simulating the difference between a multi gigahertz host processor vs multi hundred megahertz flight controller processor.

What do you mean a linux binary instead? I did a quick search but I'm not finding much.

What I would really like to do is test high frequency control loops over mavlink. It sounds like the SITL probably isn't a good way to do this in either case?

peterbarker commented 4 years ago

On Tue, 9 Jun 2020, Zach Dwiel wrote:

I was wondering how SITL dealt with simulating the difference between a multi gigahertz host processor vs multi hundred megahertz flight controller processor.

We do not attempt to emulate processors.

What do you mean a linux binary instead? I did a quick search but I'm not finding much.

SITL compiles for Linux. But ArduPilot also runs on Linux-based flight controllers (navio2, bbbmini, ...)

Linux runs real-time on those processors, so may be suitable for your purposes. No simulated sensors or whatnot, 'though, so this might not be suitable for your use-case.

What I would really like to do is test high frequency control loops over mavlink. It sounds like the SITL probably isn't a good way to do this in either case?

As in running off-board control loops?

SITL will be useful for testing the code basically works - but communicating over an actual serial port to an actual autopilot will probably be your best test.

dwiel commented 4 years ago

On Tue, 9 Jun 2020, Zach Dwiel wrote: I was wondering how SITL dealt with simulating the difference between a multi gigahertz host processor vs multi hundred megahertz flight controller processor. We do not attempt to emulate processors.

make sense

What do you mean a linux binary instead? I did a quick search but I'm not finding much. SITL compiles for Linux. But ArduPilot also runs on Linux-based flight controllers (navio2, bbbmini, ...) Linux runs real-time on those processors, so may be suitable for your purposes. No simulated sensors or whatnot, 'though, so this might not be suitable for your use-case. What I would really like to do is test high frequency control loops over mavlink. It sounds like the SITL probably isn't a good way to do this in either case? As in running off-board control loops? SITL will be useful for testing the code basically works - but communicating over an actual serial port to an actual autopilot will probably be your best test.

yeah, running off-board control loops. I was hoping to use SITL to optimize the communication latency and avoid the manual firmware upgrade process. It sounds like that is the best path forward.

IamPete1 commented 3 years ago

Sounds like this is just a problem with the SITL not running real time not a inherent AP code issue. I'm going to close, but please re-open if this is a problem on none SITL builds, they certainly should be real time.