ArduPilot / ardupilot

ArduPlane, ArduCopter, ArduRover, ArduSub source
http://ardupilot.org/
GNU General Public License v3.0
10.92k stars 17.41k forks source link

Propellers are moving when trying to download 500 logfiles through UART #11491

Open jeroentaverne opened 5 years ago

jeroentaverne commented 5 years ago

Bug report

Issue details While trying to download a lot of logfiles (500 in my case) through Mavlink, the propellers start moving. After checking the oneshot125 signals to the speed controllers by oscilloscope, they seem to go to 150usec for a short time.

Version 3.6.7

Platform [ ] All [ ] AntennaTracker [ x] Copter [ ] Plane [ ] Rover [ ] Submarine

Airframe type Octa

Hardware type Pixhawk 1

Logs NA

jeroentaverne commented 5 years ago

By changing MAX_LOG_FILES to 300 in DataFlash_File.cpp the problem occurs less often.

peterbarker commented 5 years ago

Scope traces, please?

jeroentaverne commented 5 years ago

Find them attached. I also noticed that during logfile transfers the oneshot starts jittering a lot. So the STM32F4 looptime is affected. I think at some point the oneshot output totaly stops and a failsafe (in the STMF32F1) might be activated which generates the 150usec pulse. After the transfer the oneshot signal is steady again. DS1104Z_DS1ZA170603340_2019-06-06_08 37 25 DS1104Z_DS1ZA170603340_2019-06-06_08 38 14

WickedShell commented 5 years ago

@jeroentaverne What pin on the autopilot is this? With a pixhawk 1 it's important to distinguish between main and aux outputs. (A param file may also help someone try and reproduce this)

jeroentaverne commented 5 years ago

It's M1 pin of Pixhawk1, not AUX. So it's an output of the STM32F1 IO processor. I have tried to find out what this processor is doing when the communication with the STM32F4 is lost for some time. But I can't find the recent source code. Parameter file can be found here: https://we.tl/t-9XExx95H6k

tridge commented 5 years ago

you're running 3.6.7 copter which uses nuttx IO fw. Can you please try master so you can compare with ChibiOS IO firmware? Note that 3.6.7 uses NuttX IO firmware even with ChibiOS on fmu

jeroentaverne commented 5 years ago

Thanks, I will try. Which stable version has Chibios IO firmware inside?

jeroentaverne commented 5 years ago

I have tried the latest 3.6.9 and also modified 3.6.7 to use fmuv2_IO_ChibiOS.bin. In both cases the oneshot125 outputs are not triggering motor rotation anymore and stay well under 126 usec, so that's good. But the signal to the motors is just stopped completely during log list download and then the ESCs starts generating alarm sounds. It really seems that SD card access is stopping the control loop. I thought that the control loop had the highest priority in the RTOS?

jeroentaverne commented 5 years ago

By modifying the function DataFlash_File::get_num_logs() so it directly returns 500 without doing a lot of SD card access, the alarm sounds are gone, but there is still a lot of jitter in the periods between the oneshot125 signals. It looks like SD card access isn't interrupted at all by the control loop. Perhaps it's an idea to determine the amount of logfiles only at boottime?

rmackay9 commented 5 years ago

on the dev call we discussed that the best solution was to get the mavlink message processing moved to a new thread.

jeroentaverne commented 5 years ago

So you think the issue is caused by Mavlink and not related to SD card access? Is it possible to give me some guidance how to implement this new thread? Thanks. :-)

peterbarker commented 5 years ago

On Mon, 17 Jun 2019, jeroentaverne wrote:

So you think the issue is caused by Mavlink and not related to SD card access? Is it possible to give me some guidance how to implement this new thread? Thanks. :-)

The current theory is that interrupting the main thread do do file system operations is causing the problems.

It's a great theory.

It would be nice to verify it before we dive head-long into the gcs-on-a-thread idea - that will be fun, but involved and tricky :-)

If you could hack up some function somewhere to just pause the main thread for some (gradually increasing?) number of milliseconds and see if that replicates the problem - that would be handy.

jeroentaverne commented 5 years ago

Is this problem somehow solved in 3.6.9 or 3.6.10. I noticed some changes to the source code involved in SD card access.

rmackay9 commented 5 years ago

@jeroentaverne, we haven't looked into this report much so I don't think we know if it is resolved in 3.6.9 or 10. I suspect is hasn't been resolved.

jeroentaverne commented 3 years ago

Has this been resolved? I see same problem with 4.0.3

rmackay9 commented 3 years ago

@jeroentaverne,

I still think it has not been investigated or resolved. Now is perhaps a good time to resolve it as part of the 4.1 release though. If it can be reproduced with 4.1 then I can pretty easily add it to the 4.1 issues list.

As a work around I wonder if setting MOT_SAFE_DISARM = 1 will resolve the problem.