I am trying to control multiple vehicles in Gazebo with a set of ROS2 nodes. Each vehicle is controlled by a set of six ROS2 nodes. I have followed the multi vehicle simulation with gazebo tutorial and can launch multiple iris drones. I have also followed the RTPS/DDS Interface: PX4-Fast RTPS(DDS) Bridge tutorial and can control a few (i.e., 1-3) iris drones through the micrortps_bridge with my ROS2 nodes.
However, when I try to scale to additional drones, I begin to run into errors. I will describe (a) my setup, (b) the errors I am experiencing, (c) what I have tried to do fix those errors, and (d) the results from some interesting experiments that lead me to believe there is an issue with the micrortps_bridge.
The ROS2 nodes communicate to the vehicles through the micrortps_bridge and everything works fine.
Errors
When I repeat the above steps, but modified to handle 4 drones, Terminal 2 begins to print messages like this:
[ micrortps_agent ] VehicleOdometry publisher matched
[ micrortps__timesync ] RTTI too high for timesync: 103ms
[ micrortps__timesync ] Offset not updated
[ micrortps__timesync ] Timesync offset outlier, discarding
[ micrortps__timesync ] Offset not updated
[ micrortps__timesync ] Timesync offset outlier, discarding
[ micrortps__timesync ] Offset not updated
Terminal 3-5 still run correctly. But Terminal 6 (for drone 4) will print a warning from one of my ROS2 nodes (i.e., swarm_comms) that one of its services (/swarm_comms/swarm_status) is "not available".
Attempted fixes
In urtps_bridge_topics.yaml, there are on the order of 27 messages passed over the micrortps_bridge. If I reduce that to only 10 messages (i.e., the messages my ROS2 nodes use + the timesync and timesync_status messages), and recompile the PX4-Autopilot repo as well as my repo with the px4_msgs and px4_ros_com submodules, I am able to partially remedy the problem described above. Instead of only getting three drones to work, I am now able to get four drones to work.
If I repeat the steps above, but modified to support five drones, I get the same errors described above. This leads me to believe that there is a relationship between the micrortps_bridge throughput and the number of vehicles supported.
I have gone into the px4_ros_com code and modified the limits/logic associated with the "RTTI too high for timesync" and "timesync offset outlier" errors. Instead of an rtti > 50 ms , I changed it to an rtti > 500 ms. I also increased the WINDOW_SIZE from 500 to 5000. I don't get the errors on Terminal 2, but I still get the "/swarm_comms/swarm_status service not available" error.
Experiments
To rule out my computer resources being overwhelmed by the number of ROS2 nodes and ROS2 messages being passed around, I checked gnome-system-monitor. CPU utilization, Memory and Network history all showed low values.
To rule out my the swarm_comms ROS2 node being unprepared for /swarm_comms/swarm_status service calls, I added a delay in all other ROS2 nodes to give swarm_comms time to start up before service calls were made. No change in behavior.
To rule out the /swarm_comms/swarm_status service call from being available, I create a new callback group specifically for that service, but this did not change the behavior:
To rule out ROS2 having some upper limit on the number of ROS2 rosnodes running simultaneously on the same machine (or something else networking-related), I ran the following experiment:
On Terminal 1, I started up the sim with 7 vehicles
On Terminal 2, I did nothing. I did NOT start the micrortps_agents
On Terminals 3-9, I ran the launch file that starts up an instance of the six ROS2 nodes for each of the 7 drones.
Everything worked fine. There were no errors. The /swarm_comms/swarm_status service call worked properly. ROS2 nodes are able to exchange information intra-vehicle, as well as inter-vehicle, as expected.
To drill into the root cause of the issue, I repeated the steps in the previous paragraph, but started the micrortps_agents on Terminal 2. As expected, Terminal 2 printed the same errors as above: "RTTI too high for timesync" and "Timesync offset outlier". When I started Terminal 3 (for drone 1), it ran correctly. When I started Terminal 4 (for drone 2), it printed the "/swarm_comms/swarm_status service not available" error.
Earlier, when I ran these steps (but which were modified for 3 drones in sim + 3 micrortps_agents), everything worked properly. Now, however, with 7 drones in sim + 7 micrortps_agents, it is only Terminal 3 (for drone 1) that works properly. Terminal 4 (for drone 2) produces the "/swarm_comms/swarm_status service not available" error. This leads me to believe that there is a relationship between the micrortps_bridge throughput and the number of vehicles supported.
To further drill into this issue, I modified my launch file (i.e., start_system.py) to only start two rosnodes: swarm_comms + sle (which calls the swarm_comms/swarm_status service) instead of the normal 6 rosnodes. In Terminal 1, I started 7 drones in sim. In Terminal 2, I started 7 micrortps_agents. Terminals 3-6 (for drones 1-4) ran successfully. Terminals 7-9 (for drones 5-7) produced the “swarm_comms/swarm_status service not available” error.
I presume that since only two of my ROS2 nodes are running (instead of all six), they are only requesting/sending a subset of the 10 uorb messages specified in urtps_bridge_topics.yaml. This would result in less data being sent over the micrtortps_bridge. Since this experiment allow for more drones to work, this further leads me to believe that there is a relationship between the micrortps_bridge throughput and things working properly.
The results from my experiments point to there being a problem with the micrortps_bridge supporting multiple vehicles. I am not sure whether it is the micrortps_client, micrortps_agent or something else.
Anything that can be shared to help me understand how to control 7+ vehicles over the micrortps_bridge with ROS2 would be very appreciated!
I am trying to control multiple vehicles in Gazebo with a set of ROS2 nodes. Each vehicle is controlled by a set of six ROS2 nodes. I have followed the multi vehicle simulation with gazebo tutorial and can launch multiple iris drones. I have also followed the RTPS/DDS Interface: PX4-Fast RTPS(DDS) Bridge tutorial and can control a few (i.e., 1-3) iris drones through the
micrortps_bridge
with my ROS2 nodes.However, when I try to scale to additional drones, I begin to run into errors. I will describe (a) my setup, (b) the errors I am experiencing, (c) what I have tried to do fix those errors, and (d) the results from some interesting experiments that lead me to believe there is an issue with the
micrortps_bridge
.Setup
In Terminal 1, I start the PX4-Gazebo simulator:
In Terminal 2, I start the
micrortps_agent
s:In Terminals 3-5, I run a launch file. This launch file starts six ROS2 nodes that collectively control an individual drone:
The ROS2 nodes communicate to the vehicles through the
micrortps_bridge
and everything works fine.Errors
When I repeat the above steps, but modified to handle 4 drones, Terminal 2 begins to print messages like this:
Terminal 3-5 still run correctly. But Terminal 6 (for drone 4) will print a warning from one of my ROS2 nodes (i.e.,
swarm_comms
) that one of its services (/swarm_comms/swarm_status
) is "not available".Attempted fixes
In urtps_bridge_topics.yaml, there are on the order of 27 messages passed over the
micrortps_bridge
. If I reduce that to only 10 messages (i.e., the messages my ROS2 nodes use + thetimesync
andtimesync_status
messages), and recompile the PX4-Autopilot repo as well as my repo with thepx4_msgs
andpx4_ros_com
submodules, I am able to partially remedy the problem described above. Instead of only getting three drones to work, I am now able to get four drones to work.If I repeat the steps above, but modified to support five drones, I get the same errors described above. This leads me to believe that there is a relationship between the
micrortps_bridge
throughput and the number of vehicles supported.I have gone into the
px4_ros_com
code and modified the limits/logic associated with the "RTTI too high for timesync" and "timesync offset outlier" errors. Instead of an rtti > 50 ms , I changed it to an rtti > 500 ms. I also increased theWINDOW_SIZE
from 500 to 5000. I don't get the errors on Terminal 2, but I still get the "/swarm_comms/swarm_status service not available" error.Experiments
To rule out my computer resources being overwhelmed by the number of ROS2 nodes and ROS2 messages being passed around, I checked
gnome-system-monitor
. CPU utilization, Memory and Network history all showed low values.To rule out my the
swarm_comms
ROS2 node being unprepared for/swarm_comms/swarm_status
service calls, I added a delay in all other ROS2 nodes to giveswarm_comms
time to start up before service calls were made. No change in behavior.To rule out the
/swarm_comms/swarm_status
service call from being available, I create a new callback group specifically for that service, but this did not change the behavior:To rule out ROS2 having some upper limit on the number of ROS2 rosnodes running simultaneously on the same machine (or something else networking-related), I ran the following experiment:
Everything worked fine. There were no errors. The
/swarm_comms/swarm_status
service call worked properly. ROS2 nodes are able to exchange information intra-vehicle, as well as inter-vehicle, as expected.To drill into the root cause of the issue, I repeated the steps in the previous paragraph, but started the micrortps_agents on Terminal 2. As expected, Terminal 2 printed the same errors as above: "RTTI too high for timesync" and "Timesync offset outlier". When I started Terminal 3 (for drone 1), it ran correctly. When I started Terminal 4 (for drone 2), it printed the "/swarm_comms/swarm_status service not available" error.
Earlier, when I ran these steps (but which were modified for 3 drones in sim + 3
micrortps_agent
s), everything worked properly. Now, however, with 7 drones in sim + 7micrortps_agent
s, it is only Terminal 3 (for drone 1) that works properly. Terminal 4 (for drone 2) produces the "/swarm_comms/swarm_status service not available" error. This leads me to believe that there is a relationship between themicrortps_bridge
throughput and the number of vehicles supported.To further drill into this issue, I modified my launch file (i.e.,
start_system.py
) to only start two rosnodes:swarm_comms
+sle
(which calls theswarm_comms/swarm_status
service) instead of the normal 6 rosnodes. In Terminal 1, I started 7 drones in sim. In Terminal 2, I started 7 micrortps_agents. Terminals 3-6 (for drones 1-4) ran successfully. Terminals 7-9 (for drones 5-7) produced the “swarm_comms/swarm_status service not available” error.I presume that since only two of my ROS2 nodes are running (instead of all six), they are only requesting/sending a subset of the 10 uorb messages specified in
urtps_bridge_topics.yaml
. This would result in less data being sent over themicrtortps_bridge
. Since this experiment allow for more drones to work, this further leads me to believe that there is a relationship between themicrortps_bridge
throughput and things working properly.The results from my experiments point to there being a problem with the micrortps_bridge supporting multiple vehicles. I am not sure whether it is the
micrortps_client
,micrortps_agent
or something else.Anything that can be shared to help me understand how to control 7+ vehicles over the
micrortps_bridge
with ROS2 would be very appreciated!