bluerobotics / BlueOS

The open source platform for ROV, USV, robotic system operation, development, and expansion.
https://blueos.cloud/docs/
Other
140 stars 70 forks source link

bug: some MAVLink messages have CRC errors #1740

Open clydemcqueen opened 1 year ago

clydemcqueen commented 1 year ago

Bug description

I have found that some MAVLink2 messages generated by BlueOS, sent via mavlink2rest, and written to a tlog file by QGC, and read by pymavlink, have bad CRC values. I am not exactly sure where the bug lies, but I thought I'd share my findings to see if others can help narrow it down.

Steps to reproduce

I am looking at ~20 tlog files from 3 different BlueROV2 systems:

I created a Python tool to help me examine BAD_DATA messages, and found a few patterns:

Here is a typical run of the tool I'm using:

$ tlog_bad_data.py -r .
Processing 4 files
-------------------
Results for ./2023_05_04/tlog/2023-05-04 10-20-04.tlog
msg_id 259 count 1
msg_id 260 count 1
msg_id 262 count 1
msg_id 269 count 1
4 BAD_DATA messages, 4 of them were CRC errors
-------------------
Results for ./2023_05_04/tlog/2023-05-04 10-41-19.tlog
msg_id 21 count 4
msg_id 77 count 4
msg_id 132 count 1730
msg_id 232 count 15
msg_id 259 count 3
msg_id 260 count 3
msg_id 262 count 3
msg_id 269 count 3
1765 BAD_DATA messages, 1765 of them were CRC errors
-------------------
Results for ./2023_05_04/tlog/2023-05-04 11-18-48.tlog
msg_id 21 count 2
msg_id 76 count 2
msg_id 77 count 4
msg_id 132 count 1382
msg_id 232 count 83
msg_id 259 count 2
msg_id 260 count 2
msg_id 262 count 2
msg_id 269 count 2
1481 BAD_DATA messages, 1481 of them were CRC errors
-------------------
Results for ./2023_05_04/tlog/2023-05-04 19-02-21.tlog
msg_id 21 count 1
msg_id 77 count 1
msg_id 132 count 3268
msg_id 259 count 2
msg_id 260 count 2
msg_id 262 count 2
msg_id 269 count 2
3278 BAD_DATA messages, 3278 of them were CRC errors

FYI that I am running a patched version of pymavlink, without this patch pymavlink will get confused by the BAD_DATA messages and crash.

Primary pain point(s)

At the moment this just makes log analysis a challenge. (I haven't yet traced these messages through ArduSub to see if they are causing problems in ArduSub.)

Prerequisites

clydemcqueen commented 1 year ago

I am tracing this down as time permits. I can reproduce the problem by calling mavlink2rest using curl. E.g., this will generate a "bad" message:

curl --verbose http://127.0.0.1:6040/mavlink -H "accept: application/json" --data \
'{
    "header": {
        "system_id": 1,
        "component_id": 194,
        "sequence": 52
    },
    "message": {
        "type": "DISTANCE_SENSOR",
        "time_boot_ms": 201509,
        "min_distance": 20,
        "max_distance": 5000,
        "current_distance": 2325,
        "mavtype": {"type": "MAV_DISTANCE_SENSOR_ULTRASOUND"},
        "id": 1,
        "orientation": {"type": "MAV_SENSOR_ROTATION_PITCH_270"},
        "covariance": 255,
        "horizontal_fov": 0.52,
        "vertical_fov": 0.52,
        "quaternion": [0, 0, 0, 0],
        "signal_quality": 0
    }
}'

Pymavlink is happy to receive and unpack this message live, but it barfs when reading the QGC-generated tlog file that contains this message. So I think it's a bug in QGC. If you don't mind, I'll keep this issue open as I continue isolating the problem.

clydemcqueen commented 1 year ago

OK, I think I have figured this out. This occurs when you send a MAVLink2 message with trailing 0's. For DISTANCE_SENSOR messages, this happens when signal_quality is 0.

Then:

clydemcqueen commented 1 year ago

The QGC bug is actually a pymavlink bug: https://github.com/ArduPilot/pymavlink/issues/237