MB3hel / AUVControlBoard

Vectored 6-DOF motion controller for AUVs.
https://mb3hel.github.io/AUVControlBoard/
1 stars 2 forks source link

[BUG] Sending commands too fast crashes control board #24

Closed MB3hel closed 1 year ago

MB3hel commented 1 year ago

Describe the bug Sending SASSIST1 messages too fast will cause the control board to deadlock and a watchdog reset to occur.

Tested Versions v1, v2

Impacted Versions v1, v2

To Reproduce

  1. Flash Release Build (v1) (also impacts debug builds, but does not cause reset since WDT disabled for debug)
  2. Send SASSIST1 messages as fast as possible (don't wait for ACK)
  3. Wait until serial communication drops
  4. Check reset cause using ./launch.py whyreset. It will be -6.

Expected behavior No deadlock or reset occurs

MB3hel commented 1 year ago

Can also be reproduced using python interface, which does wait for acknowledgement

test.py

#!/usr/bin/env python3

from control_board import ControlBoard, Simulator
import random
import time

def run(cb: ControlBoard, s: Simulator) -> int:
    res, imu, depth = cb.get_sensor_status()
    if(res == cb.AckError.NONE):
        print("Done")
    else:
        print("Fail")
        return 1
    if imu == False or depth == False:
        print("Missing sensor(s).")
        return 1
    cb.tune_sassist_pitch(4.0, 0.0, 0.0, 1.0, False)
    cb.tune_sassist_roll(4.0, 0.0, 0.0, 1.0, False)
    cb.tune_sassist_yaw(4.0, 0.0, 0.0, 1.0, False)
    cb.tune_sassist_depth(4.0, 0.0, 0.0, 1.0, False)
    while True:
        depth = random.randint(-100, -1)
        depth /= 10.0
        cb.set_sassist1(0.0, 0.0, 0.0, 0.0, 0.0, depth) 
        time.sleep(0)

When changing the sleep time to 0.02 or 0.05 it is also possible to reproduce, but it takes much longer on average.

MB3hel commented 1 year ago

From debug build, it seems to end up on line 168 of thirdparty/FreeRTOS/list.c. This has a lovely note...

image

And the following call stack image

MB3hel commented 1 year ago

This can also be triggered with a global mode set or local mode set. Also with non-speed set commands such as sensor status queries.

MB3hel commented 1 year ago

Seems to be caused by use of critical section in pccomm_write. Replaced with mutex. On fix_fast_msg branch.