MB3hel / AUVControlBoard

Vectored 6-DOF motion controller for AUVs.
https://mb3hel.github.io/AUVControlBoard/
2 stars 2 forks source link

[BUG] Command processing fails due to I2C lockup #16

Closed MB3hel closed 1 year ago

MB3hel commented 1 year ago

Describe the bug An I2C lockup is possible on v1 due to some oddities in the hardware state machine for I2C SERCOMs. This lockup results in I2C operations seeming to start correctly, but no interrupt ever occurs indicating a finished operation (no complete, no error, etc). This is a "silent" failure as far as software is concerned.

When this occurs, the i2c perform function will own the mutex and wait for the semaphore, however the semaphore will never be given by the interrupt callback function (as it never happens). This leads to the mutex being held forever.

This often occurs first on the sensor data threads, causing sensor data to stop working. However, an IMU set axis command will cause the cmdctrl thread to attempt an i2c operation. This results in the cmdctrl thread blocking forever.

Tested Versions v1, v2

Impacted Versions v1

To Reproduce Steps to reproduce the behavior (INCLUDE WHAT BUILD CONFIGURATION USED):

  1. Run the firmware (debug and release tested)
  2. Launch the sensordata interface script
  3. Tapidly tap the i2c pins on the BNO055 or the ItsyBitsy with your dry finger (this only accelerates the "failure" but it occurs sometimes without external noise). Repeat until sensor data is not acquired from board anymore (as seen in script output)
  4. Kill and re-launch the sensordata script. This causes an imu axis config command
  5. Any other commands will timeout since the cmdctrl thread is blocked. A debugger can be used to see the usb thread working properly, but the notification is never serviced (due to thread being blocked).

Expected behavior

MB3hel commented 1 year ago

Further investigation indicates that a bus error state may be indicated via the error interrupt when the bad state is entered. Thus, this may be detected without just using a timeout.