Open Jaaaky opened 5 years ago
thanks! some of these issues are now fixed, but I'd like to run through them all with more testing
Fllow-up;
Regarding baro not healthy, it's really to watchdog reset. I sent a pull request #11246 with suggested solution. I've tested it myself.
Regarding IOMCU_RESET; it seems that there is a real problem with IOMCU on pixhawk. I've tried different watchdog times on it, rebuilding both firmwares and flashing but I still get "IOMCU reset" every ~20 minutes on average. Code snippet example used;
#ifdef IOMCU_FW
IWDGD.RLR = 0xFF;
#else
IWDGD.RLR = 0xFFF;
#endif
On watchdog_reset; why would we check_crc for IOMCU on startup?
Is it needed? Or we can change this line like;
if ((!boardconfig || boardconfig->io_enabled() == 1) && !hal.util->was_watchdog_reset()) {
Do we really need hal.scheduler->delay(2000);
before it? I've tried - on Pixhawk1 - to skip it on watchdog_reset; the reset was amazingly fast; almost instant. I didn't get any schedular panic. But I don't know what could happen on slower boards.
What if IOMCU was also stuck - waiting for the 2 second watchdog reset timeout - at that critical moment ? FMU schedular panic would occur ? Another watchdog reset would happen? Can small ~500ms IOMCU watchdog avoid this?
Would you like adding IOMCU reset counter like this to printf to help debugging? https://github.com/ArduPilot/ardupilot/compare/master...Jaaaky:iomcu_reset_counter?expand=1
@tridge I think there is for sure a problem in IOMCU. It resets too often usually every 13 to 15 minutes without apparent reason. I've tested this on multiple different pixhawk1 boards. Just powered on USB with LOG_DISARMED enabled and leave it for some hours. No cables or external sensors or servos connected. Tested with and without SD card. I found that WATCHDOG timeout doesn't make real difference in reset rate. So I've changed it permanently to "IWDGD.RLR = 0x1FF" on my builds and did hours of flight testing without a problem. It resets, but no noticeable problems so far.
But I think we should find a way to know why it resets. I saw you've added some useful log info for FMU watchdog which help diagnosing cause and the stuck task. Isn't it possible for IOMCU to send such info to FMU while resetting too?
Hi @tridge .. I recently got a case where IOMCU didn't recover after frozen. It reset one or two times then kept frozen until I did a power reset. Is there any way to have IO logs to debug such issues? I can try to work on this if you guide me on.
any updates? have these issues been resolved?
@Jaaaky any update?
@IamPete1 Not sure if you can consider it as resolved.. For initially reported issue it should be resolved but for the cases where IOMCU didn't recover after frozen, no it's still there.. It can be reproduced using ICE engine with "bad" throttle servo mounting
I have a feeling this has to have been electrical.
I haven't heard of any similar cases.
@Jaaaky did you ever get to the bottom of this?
Yes, it's due to EMI pulses due to some bad spark plugs when the throttle servo is mounted on top of the engine. The IOMCU has to be hard reset.
Bug report
Issue details
_Testing master code I've found some problems due to/or related to watchdog reset: 1- accelcalsimple mav cmd instantly triggers watchdog reset; I've sent pull request #11231 for a fix. 2- turning on Q_PLANE =1 and SCHED_LOOP_RATE to 400 on pixhawk triggers IOMCU_RESET frequently. I thought it maybe related to sdcard, but removed the card completely and no much improvement. 3- After iomcu reset "prearm: barometer not healthy" is triggered on, preventing safe arming. 4- Last and most annoying; pixhawk connection died after accelcal - the usual one not simple -; IOMCU is RESET during calibration, usually multiple times, causing USB interface failure. And I'd to power cycle the board to be able to reconnect. The problem repeats almost every time accelcal is tried. Using master code; few months ago I noticed that servo outputs is usually reset and halted on accel calibration until it's done. - Tell me if it needs a separate bug report. So I think it's related, as IOMCU is somehow halted during calibration which caused IOMCU watchdog reset. As I understand; expect_delay_ms is not passed to IOMCU and it's currently unable to make use of it as no timerthread is run there. Correct me if I'm wrong.
Version master commit #4a237af09307 arduplane
Platform [ ] All [ ] AntennaTracker [ ] Copter [x] Plane [ ] Rover [ ] Submarine Just tested on ArduPlane.
Hardware type What autopilot hardware was used? (Pixhawk, Cube, Pixracer, Navio2, etc) Pixhawk
Logs Is any needed?
By the way, I was trying to trace the accelcal problem, modifiying IOMCU code, buidling iofirmware and flashing it with arduplane.apj, after many trials IOMCU stopped working completely, on any firmware even nuttx old ones. Not sure if it's a mere hardware problem; afraid it's caused by IOMCU too many watchdog resest, or frequent firmware flashing. Just recording the note in case someone faces the same problem on master code.