Open mhl787156 opened 2 years ago
Details from Simon Jones with the DOTS
Multiple issues in trying to get zenoh to work, primarily focused on trying to get ROS2 DDS to only work localhost. See the following issues:
See https://github.com/StarlingUAS/ProjectStarling/pull/170 as current atempt
Multiple issues again with the following test environments:
(vehicle) Ping monitor <-> Zenoh bridge <-> Zenoh bridge <-> Ping monitor (server)
vehicle
RMW_IMPLEMENTATION=rmw_cyclonedds_cpp CYCLONEDDS_URI=fastrtps_profiles.xml ./zenoh-bridge-dds -d 0 -l tcp/0.0.0.0:7447 -f -e tcp/192.168.10.80:7447 --no-multicast-scouting
server
RMW_IMPLEMENTATION=rmw_cyclonedds_cpp CYCLONEDDS_URI=fastrtps_profiles.xml ./zenoh-bridge-dds -d 0 -l tcp/0.0.0.0:7447 -f --no-multicast-scouting
CycloneDDS config file (named fastrtps_profiles.xml for efficiency)
<CycloneDDS xmlns="https://cdds.io/config" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="https://cdds.io/config https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/master/etc/cyclonedds.xsd">
<Domain id="any">
<General>
<NetworkInterfaceAddress>lo</NetworkInterfaceAddress>
<AllowMulticast>false</AllowMulticast>
</General>
<Discovery>
<ParticipantIndex>auto</ParticipantIndex>
<Peers>
<Peer Address="localhost"/>
</Peers>
<MaxAutoParticipantIndex>120</MaxAutoParticipantIndex>
</Discovery>
</Domain>
</CycloneDDS>
Observations:
-f
is required for it to work, without the -f it can;t find the topics. --group-lease <duration>
with duration set to 10 (instead of the default of 3), But that drastically reduces the frequency of RTT packets to a 5th of what it should be (from 5hz -> 1hz) for some unknown reason. After several days of testing with no good results, I am binning the use of zenoh.
Will try using fast-rtps discovery server as a last ditch attempt.
Can confirm that FASTDDS discovery server does work BUT there are two major drawbacks
ROS_DISCOVERY_SERVER=<server_ip>:11811
where the server_ip must be set to the ip address of the server. The hostname (e.g. flyingserver.local) does not work. This means that users must configure this value between use in Kind and in the flight arena.Both of these are highly not-ideal as it is difficult to produce general configurations. They also require a user to do extra non-intuitive steps which were not previously required.
@rob-clarke any thoughts on what we might be able to do?
Thoughts for the first point, we can add ROS_DISCOVERY_SERVER to vehicle config so that it can be pre-configured by us for KIND vs Flight Arena.
CLI is still a problem - maybe there needs to be a Debug container which contains the fastrtps config file.
Going back to without discovery server for a bit to work out what is going on.
First have identified that PX4 broadcasting via Mavros accounted for a large portion of broadcast (dest 255.255.255.255) packets (300+ packets/s per vehicle) on the network. Compared to base level of packets (10 -30 packets/s) this feels significant, and at the very least is contributing to quicker multicast degradation if that is happening.
Have resolved this by disabling broadcast and changing the MAVROS_GCS_URL=udp-pb://@:14550
to udp://@:14550
. QGC can connect directly to the vehicle using a udp connection and base URL of <vehicle ip>:14555
(default port on remote according to mavros).
Second thing identified is that there appears to be double to vicon traffic compared to expected. Currently we expect that vicon will be broadcast at 30hz (30 packets/s) from the vicon machine to 255.255.255.255, and indeed that can be seen in the image below (the yellow line). In the image, at about half way I switch vicon on and you see a step in overall traffic (black line approx 60packets/s).
The question is where that extra 30 packets come from as they are being sent from 192.168.10.10 -> 192.168.10.255. See the logs on the left. On further inspection they look like identification packets from the cameras - plotting bytes we see that vicon requires a 32ish kiloByte/s stream.
Good news is that RTTs from multiple vehicles does stabilise between 1.5ms - 5ms with occasional spikes.
Further to the previous comment, I have been testing system bandwidths and have identified that we are not close to hitting any limits. Testing using iperf3
over the network between devices.
Raspberry Pi network bandwidth looks to be capped between 45 to 55 Mbits/s.
Wired connection is confirmed to be 1Gbit/s
Also have verified router settings and there appear to be no multicast related settings to change
After some reading I have identified that low data rates appear to be detrimental to Multicast. Therefore I have enabled higher data rates. Multicast will however use the lowest enabled datarate. For future, lots of people recommend Multicast-To-Unicast support on Wifi Routers.
A major problem for multi-robot systems is the DDS discovery mechanism lag. Through talks with the Robotics Lab Dots system, they recommend switching to zenoh (https://github.com/eclipse-zenoh/zenoh-plugin-dds) to manage multi-robot connections, rather than using ROS2 DDS explicitly.
Also: https://zenoh.io/blog/2021-04-28-ros2-integration/
This is a good idea as it also paves the way for future cloud based development, as Zenoh allows internet scale approaches.