ArduPilot / ardupilot

ArduPlane, ArduCopter, ArduRover, ArduSub source
http://ardupilot.org/
GNU General Public License v3.0
10.8k stars 17.26k forks source link

DroneCAN: MAVLink via DroneCAN issue, CAN bus stopps #28187

Open olliw42 opened 1 week ago

olliw42 commented 1 week ago

Bug report

I cannot demonstrate with 100.00% certainty that what I'm reporting is indeed a bug in ArduPilot and not a bug in my code, but having spend lots of efforts on the issue makes me think that it could be a bug in ArduPilot.

The situation is an ArduPilot flight controller with a receiver (mLRS), which is connected via CAN bus to the FC, and both are set up to do MAVLink over DroneCAN. That is, the MAVLink communication is FC <-> DroneCAN <-> receiver <-> over the air <-> tx module in a radio <-> UART-USB-TTL adapter <-> MissionPlanner.

The issue is that when MissionPlanner sends a CMD_CAN_FRAME to ArduPilot, and ArduPilot starts to stream CAN_FRAME messages, the CAN bus stops working after a bunch of CAN_FRAME messages from ArduPilot. Only repowering the ArduPilot FC resolves the issue.

Issue details

Since it's a pretty complex situation, and I find it difficult to describe in words, I did a video in order to show the details and the issue.

https://youtu.be/og06AvoDTdw

I am DEEPLY sorry for the low quality/unsharpness in the video. I just have a pretty old crappy mobile, and it's terrible at autofocusing so I put it into manual focus, but maybe it wasn't so smart.

I want to add these details not shown in the video:

I think the issue is specific to the situation that one is doing MAVLink over DroneCAN and CAN_FRAMEs via the very same path, i.e. DroneCAN over MAVLink via MAVLink over DroneCAN.

Version This is my fork of ArduPilot, which I just updated to ArduPilot master of today, and I also included the two PRs https://github.com/ArduPilot/ardupilot/pull/28182 and https://github.com/ArduPilot/ardupilot/pull/28157. However, these did not change anything, the issue I sdo ee equally since about 3 months or so, i.e., all changes in ArduPilot since then are not related to that.

It's compiled for Plane.

Platform [] All [ ] AntennaTracker [ ] Copter [x ] Plane [ ] Rover [ ] Submarine

It presumably occurs for all platforms.

Airframe type irrelevant

Hardware type Matek H743 slim Matek mLRS mR900-30 receiver modified to CAN bus operation (code here https://github.com/olliw42/mLRS/tree/dev-dronecan)

Logs none, see video however

olliw42 commented 1 week ago

comment: this issue https://github.com/ArduPilot/ardupilot/issues/28175 isn't relevant. When I connect to the FC e.g. via it's USB port and when connect MissionPlanner, I do see the nodes just fine: grafik

tridge commented 1 week ago

@olliw42 I can't immediately see the cause, but the main suggestion I would make in debugging this is to try to reproduce in SITL. SITL support MAVCAN, and also support CAN over multicast UDP, so you can monitor with two DroneCAN monitors, one on multicast and one on MAVCAN. You can also much more easily inspect the state of the flight controller with a debugger. I also do suggest you test with #28182 applied. The bug it fixes is a bit more subtle than the issue suggests. MAVCAN did work before that fix for the straightforward case, so your test that you see the nodes via MAVCAN doesn't actually show you are not suffering from that bug with your more complex setup

tridge commented 1 week ago

note that you can bridge SITL CAN onto real CAN devices using the dronecan_bridge.py tool from pydronecan. This allows for a mix of hardware CAN devices and simulated devices, which is very useful for development

olliw42 commented 6 days ago

many thx for looking at it. I'm afraid I have to say that what you suggest for debugging is way above my paygrade and beyond my reach. It would open a whole new area for me which simply would overburden me. (I also feel that the ArduPilot side of things is not exactly my side of things)(and I'm sufficiently happy with my workaround to prevent the issue) I do have btw a vehicle with a good number of real CAN devices. I have done only light testing with it so far, and should do more, but the presence of further CAN devices seems not to affect or influence the story. My next steps will be to do more regular flight tests. I actually did apply the PR https://github.com/ArduPilot/ardupilot/pull/28182, but it did not change anything, it's the same (as mentioned in the intro) The main problem I have with the issue is that I simply just can't imagine any reason. If it would be my code I would not understand why the ArduPilot would stop sending. I also have implemented every error test in my code I could imagine or which I could find suggested on the inet, and just none triggers. It is e.g. not a BUS OFF state. As said, it seems to me that it is specific to the situation of doing DroneCAN over MAVLink via a MAVLink over DroneCAN channel.