Open VDLJu opened 4 years ago
Thanks for the very detailed report including logs.
@VDLJu I have looked at this quite carefully and unfortunately I don't yet have a clue as to the cause. I notice a high baudrate on 921600 on telem1. Was there a companion computer attached? If so, do you have a record of the mavlink stream to/from the companion computer?
the closest thing I have to a clue on this one so far is noticing the thread priority of 183 in the WLOG message. A priority of 183 means the monitor thread was running at the time of the fault. The monitor thread uses very little CPU (it sleeps almost all the time), so the fact it was running could be significant. I did wonder if the stack size of the monitor thread, which is 512, is enough when there is a delay that triggers the MON logging. I setup a test to reproduce that and found it does have enough stack (about 192 bytes free when logging MON msg). Right now the only guess I have is a nested interrupt happening during a MON message write causing stack corruption, but I can't prove that at all, and can't reproduce it
@VDLJu I have looked at this quite carefully and unfortunately I don't yet have a clue as to the cause. I notice a high baudrate on 921600 on telem1. Was there a companion computer attached? If so, do you have a record of the mavlink stream to/from the companion computer?
Unfortunately the companion computer doesn't log autopilot state. It mostly saves some external events, like image capture etc and does some realtime functions like translates mavlink to RC telemetry format. High baudrate is just there to minimize latency.
Let me know if I can help you somehow
I thought that it's better to share some previous flights, in case these can give some insight to this mystery.
FL: Fault Line 100, the source code line number where the fault occurred - can it give us some information?
@mmk0102, yes, I think Peter and Tridge did use that information and narrowed down which line was last executed before the watchdog was executed but it wasn't clear how this could possible cause the problem.
@tridge was this the DMA-teardown/setup race condition bug?
anyone working on this bug?
@kumariitian121,
I suspect this particular watchdog has been fixed and this was on a pretty old version of AP (4.0.x). Have you encountered a watchdog reset with 4.1.x or 4.2.0?
I did, using FW 4.2.0 on a mroControlZeroF7, it gets triggered every time that I switch to LAND mode. I did a lot of flights with the same FW on a mroControlZeroH7 and I never encountered this issue...
Bug report
Issue details Autopilot was reset by a watchdog in-flight when executing a mission. Reset happened after executing the last mission item, which was a command, LOITER_WAIT for 10 seconds. Reset happened about few seconds after execution of LOITER_WAIT.
Following watchdog error line was logged:
Task: -2 if the fast loop had started FL: Fault Line 100, the source code line number where the fault occurred. FT: Fault type 3. 3 = Hard Fault (the most common) FA: 404947019, Fault Address (in memory) FP: 183, Thread Priority ICSR : 4196355, Interrupt Control and State Register
Logs bin log pre WD reset bin log after WD reset Telemetry log
Version ArduCopter 4.0.3
Platform [ ] All [ ] AntennaTracker [X] Copter [ ] Plane [ ] Rover [ ] Submarine
Airframe type X4 copter
Hardware type Cube black
Previous discussion in the forum, link to it https://discuss.ardupilot.org/t/crash-ac-4-0-3-watchdog-reset-in-flight-while-executing-a-mission/57652