ArduPilot / ardupilot

ArduPlane, ArduCopter, ArduRover, ArduSub source
http://ardupilot.org/
GNU General Public License v3.0
10.98k stars 17.51k forks source link

AP_DDS: Automatic reconnect to MicroROS Agent not working #23372

Closed Ryanf55 closed 12 months ago

Ryanf55 commented 1 year ago

If connection between the autopilot and companion computer is flaky, severed, or the micro ROS agent restarts at runtime, the connection is not recovered.

The scope of this issue is to perform the following

vibgyor-s commented 1 year ago

Currently facing an issue with running the microros agent with SITL (UDP), wherein the microros agent just terminates due to "bad_array_new_length". No topics thus published on ROS2.

Cannot even restart the microros to fix this, as the then the reconnection is not possible. What could be the reason for the following error?


[mavproxy.py --out 127.0.0.1:14550 --out 127.0.0.1:14551 --master tcp:127.0.0.1:5760 --sitl 127.0.0.1:5501 --non-interactive -3] AP: Frame: QUAD/PLUS [micro_ros_agent-1] [1694444766.334808] info | Root.cpp | create_client | create | client_key: 0xAAAABBBB, session_id: 0x81 [micro_ros_agent-1] [1694444766.335212] info | SessionManager.hpp | establish_session | session established | client_key: 0xAAAABBBB, address: 127.0.0.1:36817 [mavproxy.py --out 127.0.0.1:14550 --out 127.0.0.1:14551 --master tcp:127.0.0.1:5760 --sitl 127.0.0.1:5501 --non-interactive -3] [micro_ros_agent-1] terminate called after throwing an instance of 'std::bad_array_new_length' [micro_ros_agent-1] what(): std::bad_array_new_length [mavproxy.py --out 127.0.0.1:14550 --out 127.0.0.1:14551 --master tcp:127.0.0.1:5760 --sitl 127.0.0.1:5501 --non-interactive -3] AP: ArduPilot Ready [mavproxy.py --out 127.0.0.1:14550 --out 127.0.0.1:14551 --master tcp:127.0.0.1:5760 --sitl 127.0.0.1:5501 --non-interactive -3] AP: AHRS: DCM active [mavproxy.py --out 127.0.0.1:14550 --out 127.0.0.1:14551 --master tcp:127.0.0.1:5760 --sitl 127.0.0.1:5501 --non-interactive -3] AP: DDS Client: Init Complete [ERROR] [micro_ros_agent-1]: process has died [pid 90931, exit code -6, cmd '/home/vibsin/workspace/DroneSim/ros2_ardup_ws/install/micro_ros_agent/lib/micro_ros_agent/micro_ros_agent udp4 --middleware dds --port 2019 --refs /home/vibsin/workspace/DroneSim/ros2_ardup_ws/install/ardupilot_sitl/share/ardupilot_sitl/config/dds_xrce_profile.xml --ros-args -r node:=micro_ros_agent -r ns:=/']. [mavproxy.py --out 127.0.0.1:14550 --out 127.0.0.1:14551 --master tcp:127.0.0.1:5760 --sitl 127.0.0.1:5501 --non-interactive -3] AP: XRCE Client: Participant session request failure [mavproxy.py --out 127.0.0.1:14550 --out 127.0.0.1:14551 --master tcp:127.0.0.1:5760 --sitl 127.0.0.1:5501 --non-interactive -3] AP: DDS Client: Creation Requests failed [mavproxy.py --out 127.0.0.1:14550 --out 127.0.0.1:14551 --master tcp:127.0.0.1:5760 --sitl 127.0.0.1:5501 --non-interactive -3] AP: RC7: SaveWaypoint LOW [mavproxy.py --out 127.0.0.1:14550 --out 127.0.0.1:14551 --master tcp:127.0.0.1:5760 --sitl 127.0.0.1:5501 --non-interactive -3] paramftp: bad count 1327 should be 1325 [mavproxy.py --out 127.0.0.1:14550 --out 127.0.0.1:14551 --master tcp:127.0.0.1:5760 --sitl 127.0.0.1:5501 --non-interactive -3] AP: ArduCopter V4.5.0-dev (768e2409)

srmainwaring commented 1 year ago

@vibgyor-s I can't tell what might be causing this from the log you posted. Could you post all steps required to replicate (terminal commands and full log) and some details about the system you're running.

srmainwaring commented 1 year ago

Notes

The PX4 uxrce_dds_client has some support for reconnecting to the micro-ROS agent if the connection is dropped. It makes use of the uxr ping functions declared in uxr/client/util/ping.h to monitor the connection status:

To implement similar behaviour in ArduPilot AP_DDS we need the following:

Tracking in: https://github.com/ArduPilot/ardupilot/pull/25228

Issues

1. micro-ROS agent is restarted

Testing

Figure: reconnection after micro-ROS agent is repeatedly restarted. dds-reconnect

KyleJewiss commented 1 year ago

Hi @srmainwaring, I've merged your "pr_dds_reconnect" branch. I'm getting a weird issue where once I disconnect the DDS client, it will register the "disconnecting", but after a couple of seconds it will then "exit". After this exit I can't reconnect to the client without doing a power reset. Any help would be awesome, cheers.

image

srmainwaring commented 1 year ago

Hi @KyleJewiss, thanks for testing the PR. The timeout after a 10s seconds is intentional.

If a connection cannot be reestablished after 10s the loop exits.

        // check ping
        const uint64_t ping_timeout_ms{1000};
        const uint8_t ping_max_attempts{10};
        if (!uxr_ping_agent_attempts(comm, ping_timeout_ms, ping_max_attempts)) {
            GCS_SEND_TEXT(MAV_SEVERITY_ERROR, "DDS Client: No ping response, exiting");
            return;
        }

We need to implement fall-back behaviour in a future PR.

KyleJewiss commented 1 year ago

Good to know. Thanks for the for the code and the reply @srmainwaring. Have a good one

srmainwaring commented 1 year ago

Btw - were you testing in SITL or hardware?

At the moment we can manage a reconnect of the client if the micro-ROS agent dies and is respawned (within 10s).

Unplugging and reconnecting a serial to USB adapter connecting a flight controller to a PC is not working. I have not tested a connection between a FCU and GPIO pins on a companion computer such as an RPi4.

KyleJewiss commented 1 year ago

We were testing on hardware, that makes sense. We can close the agent and reconnect in those 10 seconds but if we take longer, we need to unplug and plug back in.

Ryanf55 commented 11 months ago

Currently facing an issue with running the microros agent with SITL (UDP), wherein the microros agent just terminates due to "bad_array_new_length". No topics thus published on ROS2.

Cannot even restart the microros to fix this, as the then the reconnection is not possible. What could be the reason for the following error?

[mavproxy.py --out 127.0.0.1:14550 --out 127.0.0.1:14551 --master tcp:127.0.0.1:5760 --sitl 127.0.0.1:5501 --non-interactive -3] AP: Frame: QUAD/PLUS [micro_ros_agent-1] [1694444766.334808] info | Root.cpp | create_client | create | client_key: 0xAAAABBBB, session_id: 0x81 [micro_ros_agent-1] [1694444766.335212] info | SessionManager.hpp | establish_session | session established | client_key: 0xAAAABBBB, address: 127.0.0.1:36817 [mavproxy.py --out 127.0.0.1:14550 --out 127.0.0.1:14551 --master tcp:127.0.0.1:5760 --sitl 127.0.0.1:5501 --non-interactive -3] [micro_ros_agent-1] terminate called after throwing an instance of 'std::bad_array_new_length' [micro_ros_agent-1] what(): std::bad_array_new_length [mavproxy.py --out 127.0.0.1:14550 --out 127.0.0.1:14551 --master tcp:127.0.0.1:5760 --sitl 127.0.0.1:5501 --non-interactive -3] AP: ArduPilot Ready [mavproxy.py --out 127.0.0.1:14550 --out 127.0.0.1:14551 --master tcp:127.0.0.1:5760 --sitl 127.0.0.1:5501 --non-interactive -3] AP: AHRS: DCM active [mavproxy.py --out 127.0.0.1:14550 --out 127.0.0.1:14551 --master tcp:127.0.0.1:5760 --sitl 127.0.0.1:5501 --non-interactive -3] AP: DDS Client: Init Complete [ERROR] [micro_ros_agent-1]: process has died [pid 90931, exit code -6, cmd '/home/vibsin/workspace/DroneSim/ros2_ardup_ws/install/micro_ros_agent/lib/micro_ros_agent/micro_ros_agent udp4 --middleware dds --port 2019 --refs /home/vibsin/workspace/DroneSim/ros2_ardup_ws/install/ardupilot_sitl/share/ardupilot_sitl/config/dds_xrce_profile.xml --ros-args -r node:=micro_ros_agent -r ns:=/']. [mavproxy.py --out 127.0.0.1:14550 --out 127.0.0.1:14551 --master tcp:127.0.0.1:5760 --sitl 127.0.0.1:5501 --non-interactive -3] AP: XRCE Client: Participant session request failure [mavproxy.py --out 127.0.0.1:14550 --out 127.0.0.1:14551 --master tcp:127.0.0.1:5760 --sitl 127.0.0.1:5501 --non-interactive -3] AP: DDS Client: Creation Requests failed [mavproxy.py --out 127.0.0.1:14550 --out 127.0.0.1:14551 --master tcp:127.0.0.1:5760 --sitl 127.0.0.1:5501 --non-interactive -3] AP: RC7: SaveWaypoint LOW [mavproxy.py --out 127.0.0.1:14550 --out 127.0.0.1:14551 --master tcp:127.0.0.1:5760 --sitl 127.0.0.1:5501 --non-interactive -3] paramftp: bad count 1327 should be 1325 [mavproxy.py --out 127.0.0.1:14550 --out 127.0.0.1:14551 --master tcp:127.0.0.1:5760 --sitl 127.0.0.1:5501 --non-interactive -3] AP: ArduCopter V4.5.0-dev (768e240)

This related to unresolved: https://github.com/micro-ROS/micro-ROS-Agent/issues/205