ArduPilot / MissionPlanner

Mission Planner Ground Control Station for ArduPilot (c# .net)
http://ardupilot.org/planner/
GNU General Public License v3.0
1.79k stars 2.42k forks source link

extremely slow param download with multiple vehicles #2784

Open tridge opened 2 years ago

tridge commented 2 years ago

testing with two vehicles (a ship and a quadplane) on a single RFD900x, the parameter download can take more than 10 minutes. The same link with mavproxy takes only a few seconds. The user tells me that with 6 vehicles param download takes over an hour. Watching the download, MissionPlanner quickly falls back to single PARAM_VALUE at a time. I found that if during the param download I forwarded the link to mavproxy and then mavproxy did a parameter fetch (not ftp) then MissionPlanner then got the parameters quickly as it saw the parameters fetched by mavproxy.

meee1 commented 2 years ago

Possible to share a log?

tridge commented 2 years ago

I've setup the same arrangement and radios at home and reproduced the problem. The logs below are for MissionPlanner and mavproxy both running on the same win10 machine. MAVProxy takes about 30s to get all the plane and rover parameters (one vehicle is plane, one is rover), MissionPlanner takes 10 minutes just to get the plane params. The rover params had started to download (slowly) when I disconnected. Interestingly, MissionPlanner gets a lot of duplicate parameters. It is like it has multiple threads downloading from the same vehicle. slowparam.zip

tridge commented 2 years ago

the setup is:

meee1 commented 2 years ago

im just going though this at the moment and is this safe? const ssize_t read_bytes = AP::FS().read(ftp.fd, reply.data, request.size);

                    there is no guard on request.size on both read and burstread
tridge commented 2 years ago

` there is no guard on request.size on both read and burstread

you're right, thanks! PR https://github.com/ArduPilot/ardupilot/pull/20024

tridge commented 2 years ago

@meee1 any update on this? It is also very slow when there is a mavlink gimbal attached (reported by a partner)

meee1 commented 2 years ago

The mavftp issue should be resolved, and was likerly due to large packets on a RFD. I think a retest would be good at the moment

tridge commented 2 years ago

@meee1 it is not fixed. I've also now tested with UDP over ethernet and it is just as bad, so it isn't a radio issue. The UDP setup I've used to reproduce is:

after mission planner connects it sees both boards, and then if you go to the config screen for the 2nd cube you see the parameter download slowly crawling along: image

tridge commented 2 years ago

using "watch PARAM_VALUE" in mavproxy with the above setup seems to show MissionPlanner downloading 2 parameters per second

meee1 commented 2 years ago

MP by default only pulls params on the primary, whats your seeing with the 2 params a second is just a slow pull of all params, from MAVs that are disarmed.

the initial log showed mavftp packets missing from the MAV, causing the large delays

this new issue your seeing i will look into further

tadam777 commented 2 years ago

Hi Michael and Andrew,

I'm the developer of Ultimate LRS and made a lot of testing of the telemetry between Ardupilot and Mission Planner over the last few years.

I noticed something that can be related to this issue : if the link has some lost packets, very quickly Mission Planner will after some time go into a "request one by one" parameters mode. In a perfect world, this would just require the time to transfer every missing parameter one by one.

But in practice there's a catch : Ardupilot has no way to know it has to stop sending all parameters (the initial mavlink request) : so in parallel to sending the requested parameters one by one, it continues to send its full list of parameters.

And this has an impact on Mission Planner, because when it requests parameters one by one, it expects to receive exactly that parameter ID, and not another one. If it receives another parameter, which happens because Ardupilot continues to send its full list, it's just dropped, and the "correct" parameter is requested again 3 times.

It's slowing down because the full list of parameters is received but all of its messages are dropped. And the one by one messages are requested several times before being received. It can also happen that if they are not received in a timely manner, will be requested again and again.

It can be seen as a competition between the full list that continues to be sent, and the one by one parameters requested : Ardupilot will try to serve both, but we're never certain if the next parameter will be from the full list or from the one by one request.

https://github.com/ArduPilot/MissionPlanner/blob/e2a83b383c41375326e6e810dca68114b9de2a14/ExtLibs/ArduPilot/Mavlink/MAVLinkInterface.cs#L2217-L2224

This can create a death spiral and become very slow : -request all parameters is still ongoing -request parameter 10 one by one -receive parameter 54 from the request all parameters : dropped -request again parameter 10 one by one -received parameter 10 (by chance the one requested, but could be another one from the all parameters) -request parameter 11 one by one -receive parameter 10 one by one (from the second request for parameter 10 one by one) : dropped with "already received" message as MP is waiting for packet 11 -receive parameter 55 from the request all parameters : dropped etc

In addition, if a correct parameter (the exact index waited for) isn't received for 4 seconds, it will request all parameters again, up to 3 times :

https://github.com/ArduPilot/MissionPlanner/blob/e8d6b2f03c3626051eb6e61d5851fd89b38a51e3/ExtLibs/ArduPilot/Mavlink/MAVLinkInterface.cs#L1992-L2004

So in the worst case, we receive several times the full list of parameters "in parallel", which are getting completely dropped out, and many one by one parameters as well. As we don't get the exact parameters waited for, the parameters are requested one by one several times, with a lot of "already received" messages.

My suggestion to solve this would be for Mission Planner to accept and consume any parameter received, even if not exactly the one expected. This will avoid entering a dead loop like above.

2) A secondary issue is that Mission Planner analyzes the parameter index to know when it received all missing packets, but the packet index isn't reliable as we can receive simultaneously packets from the "request all parameters" and from the "request one by one".

https://github.com/ArduPilot/MissionPlanner/blob/e2a83b383c41375326e6e810dca68114b9de2a14/ExtLibs/ArduPilot/Mavlink/MAVLinkInterface.cs#L1972-L1974

I don't think this issue has a high impact, but maybe in some cases it could trigger a new "request all parameters".

3) Something could be improved on Ardupilot side. If for some reason a full parameter list is requested, and before having received it completely, another request for full parameter list is done, Ardupilot will send the two lists of parameters. Which can slow down the link as every parameter packet is about 30 bytes. I noticed that in some cases it happens : Mission Planner requests several times the full list of parameters, but I didn't catch exactly why in the code. This can certainly lead to a major slowdown, if the full parameters list is requested several times in a loop.

My suggestion here would be that Ardupilot stops sending a first list of parameters (requested with a "request all parameters" message), before sending a new list, rather than sending the two or more lists "simultaneously".

In addition I think the mavlink protocol should include a message to stop a previous request of sending all parameters, and maybe a message to request a specific list of parameters in one message. (This said, probably most future boards will probably use MavFTP to transfer the parameters)

4) Maybe not directly related but I created an issue a few month ago about Ardupilot telemetry which was slower since version 3.9.0. Even if it's not the main cause, it can have an impact on the slowness :

https://github.com/ArduPilot/ardupilot/issues/14067

5) There's a last point that I think can contribute to the slowness, but I didn't test this one yet : independently from the "request all parameters", it seems that when parameters are requested "one by one", they are actually requested by groups of 10 (10 distinct requests), every 600 ms.

https://github.com/ArduPilot/MissionPlanner/blob/e8d6b2f03c3626051eb6e61d5851fd89b38a51e3/ExtLibs/ArduPilot/Mavlink/MAVLinkInterface.cs#L2008

However asking for 10 parameters every 600 ms is a bit too fast : Ardupilot will send only 10 parameters per second (if I remember well).

Ardupilot will answer all parameters requests, even if there are several requests for the same parameter. So I think this contributes to the slowness : there's a strong chance of requesting several times the same parameters one by one just because Arduplane can't send them as fast as requested.

Imagine 50 parameters are missing, and requested one by one. first 600 ms : Mission Planner asks for the 10 first parameters, and receives 6 parameters (10 parameters per second). next 600 ms : Mission Planner asks the 10 other parameters and receives 6 parameters etc

meee1 commented 2 years ago

what is the max speed we are talking about here? ie baud? ie what's a relisting data rate?

looking at your analysis

-request all parameters is still ongoing before this can happen, MP must NOT see a single param packet, for 4 seconds. https://github.com/ArduPilot/MissionPlanner/blob/e2a83b383c41375326e6e810dca68114b9de2a14/ExtLibs/ArduPilot/Mavlink/MAVLinkInterface.cs#L1993 and https://github.com/ArduPilot/MissionPlanner/blob/e2a83b383c41375326e6e810dca68114b9de2a14/ExtLibs/ArduPilot/Mavlink/MAVLinkInterface.cs#L1905 -request parameter 10 one by one yes, after the first 3 attempts at getting the entire param list, and not completing at least 75% of them https://github.com/ArduPilot/MissionPlanner/blob/e2a83b383c41375326e6e810dca68114b9de2a14/ExtLibs/ArduPilot/Mavlink/MAVLinkInterface.cs#L1996 -receive parameter 54 from the request all parameters : dropped incorrect all packets are accepted, but only those we need are added to the param list, as we already have it. https://github.com/ArduPilot/MissionPlanner/blob/e2a83b383c41375326e6e810dca68114b9de2a14/ExtLibs/ArduPilot/Mavlink/MAVLinkInterface.cs#L1925

ive stopped here, as it just repeats

so my questions are why are we not seeing any valid param packet for 4 seconds? what is the parameter output rate programmed into the autopilot? SR?_PARAM what is the typical throughput of the link? without loss. is flow control enabled on the link?

tadam777 commented 2 years ago

I'm normally using a 19200 bauds link without flow control, so the typical throughput is 1920 Bps, but for testing purposes I used an FTDI adapter directly connected to the FC (Mateksys F765 wing). MavFTP is deactivated.

The SR?_PARAM is at 10 Hz by default, and I tried different other rates but it seems that Ardupilot doesn't take it into account (tested on AP 4.1.3). The parameters are sent at about 25 Hz even if the SR?_PARAM are at 5Hz :

image

(We can see PARAM_VALUE messages received at 25Hz, while the SR?_PARAM are at 5 Hz)

I rechecked and you're right all incoming parameters are accepted.

I make some testing by stopping the flow during parameters transmission (removing the serial wire to the FTDI adapter), but with the latest version everything runs smoothly and missing parameters are correctly collected one by one at the end.

The only thing to notice is that if running at 19200 bauds, there's something with the get one by one parameters, maybe because the parameter transmission is slower. Ten parameters are requested, but about half of them are already received. This doesn't happen at faster rates such as 115200 bauds, only at 19200.

MP 1 3 77 @ 19200 bauds - provoked short interruption during parameters transmission.txt

INFO MissionPlanner.MAVLinkInterface - Get param 1 by 1 - got 1129 of 1165 INFO MissionPlanner.MAVLinkInterface - Already got 132 'BARO2_WCF_ENABLE' 96 INFO MissionPlanner.MAVLinkInterface - Already got 133 'BARO3_WCF_ENABLE' 96 INFO MissionPlanner.MAVLinkInterface - Already got 134 'GPS_TYPE' 96 INFO MissionPlanner.MAVLinkInterface - Already got 135 'GPS_TYPE2' 96 INFO MissionPlanner.MAVLinkInterface - 662 got param 136 of 1165 name: GPS_NAVFILTER 8 INFO MissionPlanner.MAVLinkInterface - 678 got param 137 of 1165 name: GPS_AUTO_SWITCH 1 INFO MissionPlanner.MAVLinkInterface - 694 got param 138 of 1165 name: GPS_MIN_DGPS 100 INFO MissionPlanner.MAVLinkInterface - 709 got param 139 of 1165 name: GPS_SBAS_MODE 2 INFO MissionPlanner.MAVLinkInterface - 772 got param 140 of 1165 name: GPS_MIN_ELEV -100 INFO MissionPlanner.MAVLinkInterface - 788 got param 141 of 1165 name: GPS_INJECT_TO 127 bps 1945 loss 2 left 17 mem 59,201171875 mav2 False sign False mav1 63 mav2 0 signed 0 HUD 17 hz drawtime 0 gl True INFO MissionPlanner.CurrentState - EKF red uninit EKF_UNINITIALIZED INFO MissionPlanner.MAVLinkInterface - Already got 136 'GPS_NAVFILTER' 97 INFO MissionPlanner.MAVLinkInterface - Already got 137 'GPS_AUTO_SWITCH' 97 INFO MissionPlanner.MAVLinkInterface - Get param 1 by 1 - got 1135 of 1165 INFO MissionPlanner.MAVLinkInterface - Already got 138 'GPS_MIN_DGPS' 97 INFO MissionPlanner.MAVLinkInterface - Already got 139 'GPS_SBAS_MODE' 97 INFO MissionPlanner.MAVLinkInterface - Already got 140 'GPS_MIN_ELEV' 97 INFO MissionPlanner.MAVLinkInterface - Already got 141 'GPS_INJECT_TO' 97 INFO MissionPlanner.MAVLinkInterface - 363 got param 142 of 1165 name: GPS_SBP_LOGMASK -256 INFO MissionPlanner.MAVLinkInterface - 379 got param 143 of 1165 name: GPS_RAW_DATA 0 INFO MissionPlanner.MAVLinkInterface - 409 got param 144 of 1165 name: GPS_GNSS_MODE 0 INFO MissionPlanner.MAVLinkInterface - 409 got param 145 of 1165 name: GPS_SAVE_CFG 2 INFO MissionPlanner.CurrentState - EKF red uninit EKF_UNINITIALIZED INFO MissionPlanner.MAVLinkInterface - Already got 142 'GPS_SBP_LOGMASK' 97 INFO MissionPlanner.MAVLinkInterface - Get param 1 by 1 - got 1139 of 1165 _INFO MissionPlanner.MAVLinkInterface - Already got 143 'GPS_RAW_DATA' 97 INFO MissionPlanner.MAVLinkInterface - Already got 144 'GPS_GNSSMODE' 97 bps 1907 loss 2 left 38 mem 59,8525390625 mav2 False sign False mav1 62 mav2 0 signed 0 _INFO MissionPlanner.MAVLinkInterface - Already got 145 'GPS_SAVECFG' 97 HUD 11 hz drawtime 1 gl True INFO MissionPlanner.MAVLinkInterface - 68 got param 146 of 1165 name: GPS_GNSS_MODE2 0 INFO MissionPlanner.MAVLinkInterface - 84 got param 147 of 1165 name: GPS_AUTO_CONFIG 1 INFO MissionPlanner.MAVLinkInterface - 100 got param 148 of 1165 name: GPS_RATE_MS 200 INFO MissionPlanner.MAVLinkInterface - 115 got param 149 of 1165 name: GPS_RATE_MS2 200 INFO MissionPlanner.MAVLinkInterface - 131 got param 150 of 1165 name: GPS_POS1_X 0 INFO MissionPlanner.MAVLinkInterface - 146 got param 151 of 1165 name: GPS_POS1_Y 0 INFO MissionPlanner.CurrentState - EKF red uninit EKF_UNINITIALIZED INFO MissionPlanner.CurrentState - EKF red uninit EKF_UNINITIALIZED _INFO MissionPlanner.MAVLinkInterface - Already got 146 'GPS_GNSS_MODE2' 98 INFO MissionPlanner.MAVLinkInterface - Already got 147 'GPS_AUTOCONFIG' 98 INFO MissionPlanner.MAVLinkInterface - Get param 1 by 1 - got 1145 of 1165 INFO MissionPlanner.MAVLinkInterface - Already got 148 'GPS_RATE_MS' 98 INFO MissionPlanner.MAVLinkInterface - Already got 149 'GPS_RATE_MS2' 98 INFO MissionPlanner.MAVLinkInterface - Already got 150 'GPS_POS1_X' 98 INFO MissionPlanner.MAVLinkInterface - Already got 151 'GPS_POS1_Y' 98 INFO MissionPlanner.MAVLinkInterface - 762 got param 152 of 1165 name: GPS_POS1_Z 0 INFO MissionPlanner.MAVLinkInterface - 774 got param 153 of 1165 name: GPS_POS2_X 0 INFO MissionPlanner.MAVLinkInterface - 804 got param 154 of 1165 name: GPS_POS2_Y 0 INFO MissionPlanner.MAVLinkInterface - 834 got param 155 of 1165 name: GPS_POS2_Z 0 INFO MissionPlanner.CurrentState - EKF red uninit EKF_UNINITIALIZED bps 1899 loss 2 left 28 mem 60,76953125 mav2 False sign False mav1 63 mav2 0 signed 0

This isn't blocking, but slows down the end of the parameters transmission.

To me it's like the one by one requested parameters are already sent by Ardupilot, but not yet received by Mission Planner, so the same parameters are requested again. Maybe are they in some serial buffer ?

I also made some checks with longer than 4 seconds break of the serial connection, to provoke a full parameters request. I can't get to a situation where it's really blocking, but here's an extract where we can see in parallel the parameters requested one by one, and not one but two full parameters transmission. This is at 115200 bauds.

The parameters requested one by one start at param 1048 The first full transmission is at param 475 (and provokes "already got" messages) The second simultaneous full transmission is at param 22 (and "already got" messages)

MP 1 3 77 @ 115200 bauds - provoked long interruption during parameters transmission.txt

INFO MissionPlanner.MAVLinkInterface - Get param 1 by 1 - got 1048 of 1165 INFO MissionPlanner.MAVLinkInterface - 795 got param 1048 of 1165 name: OSD1_CLIMBEFF_EN 0 INFO MissionPlanner.MAVLinkInterface - 858 got param 1049 of 1165 name: OSD1_CLIMBEFF_X 0 INFO MissionPlanner.MAVLinkInterface - 984 got param 1050 of 1165 name: OSD1_CLIMBEFF_Y 0 INFO MissionPlanner.MAVLinkInterface - 984 got param 1051 of 1165 name: OSD1_EFF_EN 0 INFO MissionPlanner.MAVLinkInterface - 984 got param 1052 of 1165 name: OSD1_EFF_X 22 INFO MissionPlanner.MAVLinkInterface - Already got 476 'TECS_THR_DAMP' 90 INFO MissionPlanner.MAVLinkInterface - Already got 475 'TECS_TIME_CONST' 90 INFO MissionPlanner.MAVLinkInterface - Already got 477 'TECS_INTEG_GAIN' 90 INFO MissionPlanner.MAVLinkInterface - Already got 478 'TECS_VERT_ACC' 90 INFO MissionPlanner.MAVLinkInterface - Already got 479 'TECS_HGT_OMEGA' 90 bps 1912 loss 3 left 2 mem 64,1640625 mav2 False sign False mav1 60 mav2 0 signed 0 INFO MissionPlanner.MAVLinkInterface - GetParam name: '' or index: 22 1:1 INFO MissionPlanner.MAVLinkInterface - set giveComport current False new True INFO MissionPlanner.MAVLinkInterface - set giveComport current True new False INFO MissionPlanner.MAVLinkInterface - GetParam name: '' or index: 23 1:1 INFO MissionPlanner.MAVLinkInterface - set giveComport current False new True INFO MissionPlanner.MAVLinkInterface - set giveComport current True new False HUD 16 hz drawtime 1 gl True INFO MissionPlanner.CurrentState - EKF red uninit EKF_UNINITIALIZED INFO MissionPlanner.MAVLinkInterface - 64 got param 1053 of 1165 name: OSD1_EFF_Y 10 INFO MissionPlanner.MAVLinkInterface - 80 got param 1054 of 1165 name: OSD1_BTEMP_EN 0 INFO MissionPlanner.MAVLinkInterface - 80 got param 1055 of 1165 name: OSD1_BTEMP_X 0 INFO MissionPlanner.MAVLinkInterface - 174 got param 1056 of 1165 name: OSD1_BTEMP_Y 0 INFO MissionPlanner.MAVLinkInterface - 174 got param 1057 of 1165 name: OSD1_ATEMP_EN 0 INFO MissionPlanner.MAVLinkInterface - Already got 22 'SERIAL6_OPTIONS' 90 INFO MissionPlanner.MAVLinkInterface - Already got 23 'SERIAL_PASS1' 90 INFO MissionPlanner.MAVLinkInterface - Already got 480 'TECS_SPD_OMEGA' 90 INFO MissionPlanner.MAVLinkInterface - Get param 1 by 1 - got 1058 of 1165 INFO MissionPlanner.MAVLinkInterface - Already got 481 'TECS_RLL2THR' 90 INFO MissionPlanner.MAVLinkInterface - Already got 482 'TECS_SPDWEIGHT' 90 INFO MissionPlanner.MAVLinkInterface - Already got 483 'TECS_PTCH_DAMP' 90 INFO MissionPlanner.MAVLinkInterface - Already got 484 'TECS_SINK_MAX' 90 INFO MissionPlanner.MAVLinkInterface - 456 got param 1058 of 1165 name: OSD1_ATEMP_X 0 INFO MissionPlanner.MAVLinkInterface - 583 got param 1059 of 1165 name: OSD1_ATEMP_Y 0 INFO MissionPlanner.MAVLinkInterface - 583 got param 1060 of 1165 name: OSD1_BAT2_VLT_EN 0 INFO MissionPlanner.MAVLinkInterface - 583 got param 1061 of 1165 name: OSD1_BAT2_VLT_X 0 INFO MissionPlanner.MAVLinkInterface - Already got 485 'TECS_LAND_ARSPD' 91 INFO MissionPlanner.MAVLinkInterface - Already got 486 'TECS_LAND_THR' 91 INFO MissionPlanner.MAVLinkInterface - Already got 487 'TECS_LAND_SPDWGT' 91 INFO MissionPlanner.MAVLinkInterface - Already got 488 'TECS_PITCH_MAX' 91 INFO MissionPlanner.MAVLinkInterface - Already got 489 'TECS_PITCH_MIN' 91 INFO MissionPlanner.MAVLinkInterface - 676 got param 1062 of 1165 name: OSD1_BAT2_VLT_Y 0 INFO MissionPlanner.MAVLinkInterface - 676 got param 1063 of 1165 name: OSD1_BAT2USED_EN 0 INFO MissionPlanner.MAVLinkInterface - 676 got param 1064 of 1165 name: OSD1_BAT2USED_X 0 INFO MissionPlanner.MAVLinkInterface - 769 got param 1065 of 1165 name: OSD1_BAT2USED_Y 0 INFO MissionPlanner.MAVLinkInterface - 769 got param 1066 of 1165 name: OSD1_ASPD2_EN 0 INFO MissionPlanner.MAVLinkInterface - 769 got param 1067 of 1165 name: OSD1_ASPD2_X 0 INFO MissionPlanner.CurrentState - EKF red uninit EKF_UNINITIALIZED INFO MissionPlanner.MAVLinkInterface - Already got 490 'TECS_LAND_SINK' 91 INFO MissionPlanner.MAVLinkInterface - Already got 491 'TECS_LAND_TCONST' 91 INFO MissionPlanner.MAVLinkInterface - Already got 492 'TECS_LAND_DAMP' 91 INFO MissionPlanner.MAVLinkInterface - Already got 493 'TECS_LAND_PMAX' 91 INFO MissionPlanner.MAVLinkInterface - Already got 494 'TECS_APPR_SMAX' 91 INFO MissionPlanner.MAVLinkInterface - GetParam name: '' or index: 24 1:1 INFO MissionPlanner.MAVLinkInterface - set giveComport current False new True INFO MissionPlanner.MAVLinkInterface - set giveComport current True new False INFO MissionPlanner.MAVLinkInterface - GetParam name: '' or index: 25 1:1 INFO MissionPlanner.MAVLinkInterface - set giveComport current False new True INFO MissionPlanner.MAVLinkInterface - set giveComport current True new False HUD 16 hz drawtime 1 gl True bps 2021 loss 3 left 33 mem 60,45703125 mav2 False sign False mav1 64 mav2 0 signed 0 INFO MissionPlanner.MAVLinkInterface - Already got 24 'SERIAL_PASS2' 91

Again it's not blocking, but slowing down as all parameters are sent multiple times. This last case requires a 4 second interruption in the transmission.

A suggestion would be to detect if a full parameters transmission is ongoing, and not send additional request for parameters while it's ongoing (as it's impossible to ask Ardupilot to stop a full parameters transmission).

But these are just small improvements, on the latest Mission Planner version I can't create a situation where the parameters transmission would take more than about 1 minute.