StarlingUAS / ProjectStarling

BRL Flight Arena Infrastructure 2.0
Other
16 stars 3 forks source link

ROS2 Discovery Destroying Local Network #164

Open mhl787156 opened 2 years ago

mhl787156 commented 2 years ago

A major problem for multi-robot systems is the DDS discovery mechanism lag. Through talks with the Robotics Lab Dots system, they recommend switching to zenoh (https://github.com/eclipse-zenoh/zenoh-plugin-dds) to manage multi-robot connections, rather than using ROS2 DDS explicitly.

Also: https://zenoh.io/blog/2021-04-28-ros2-integration/

This is a good idea as it also paves the way for future cloud based development, as Zenoh allows internet scale approaches.

mhl787156 commented 2 years ago

image

Details from Simon Jones with the DOTS

mhl787156 commented 2 years ago

Multiple issues in trying to get zenoh to work, primarily focused on trying to get ROS2 DDS to only work localhost. See the following issues:

See https://github.com/StarlingUAS/ProjectStarling/pull/170 as current atempt

mhl787156 commented 2 years ago

Multiple issues again with the following test environments:

(vehicle) Ping monitor <-> Zenoh bridge <-> Zenoh bridge <-> Ping monitor (server)

vehicle

RMW_IMPLEMENTATION=rmw_cyclonedds_cpp CYCLONEDDS_URI=fastrtps_profiles.xml ./zenoh-bridge-dds -d 0 -l tcp/0.0.0.0:7447 -f -e tcp/192.168.10.80:7447 --no-multicast-scouting

server

RMW_IMPLEMENTATION=rmw_cyclonedds_cpp CYCLONEDDS_URI=fastrtps_profiles.xml ./zenoh-bridge-dds -d 0 -l tcp/0.0.0.0:7447 -f --no-multicast-scouting

CycloneDDS config file (named fastrtps_profiles.xml for efficiency)

<CycloneDDS xmlns="https://cdds.io/config" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="https://cdds.io/config https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/master/etc/cyclonedds.xsd">
  <Domain id="any">
      <General>
          <NetworkInterfaceAddress>lo</NetworkInterfaceAddress>
          <AllowMulticast>false</AllowMulticast>
      </General>
      <Discovery>
          <ParticipantIndex>auto</ParticipantIndex>
          <Peers>
              <Peer Address="localhost"/>
          </Peers>
          <MaxAutoParticipantIndex>120</MaxAutoParticipantIndex>
      </Discovery>
  </Domain>
</CycloneDDS>

Observations:

After several days of testing with no good results, I am binning the use of zenoh.

Will try using fast-rtps discovery server as a last ditch attempt.

mhl787156 commented 2 years ago

Can confirm that FASTDDS discovery server does work BUT there are two major drawbacks

Both of these are highly not-ideal as it is difficult to produce general configurations. They also require a user to do extra non-intuitive steps which were not previously required.

@rob-clarke any thoughts on what we might be able to do?

mhl787156 commented 2 years ago

Thoughts for the first point, we can add ROS_DISCOVERY_SERVER to vehicle config so that it can be pre-configured by us for KIND vs Flight Arena.

CLI is still a problem - maybe there needs to be a Debug container which contains the fastrtps config file.

mhl787156 commented 2 years ago

Going back to without discovery server for a bit to work out what is going on.

PX4 Broadcasting

First have identified that PX4 broadcasting via Mavros accounted for a large portion of broadcast (dest 255.255.255.255) packets (300+ packets/s per vehicle) on the network. Compared to base level of packets (10 -30 packets/s) this feels significant, and at the very least is contributing to quicker multicast degradation if that is happening.

Screenshot from 2022-08-13 20-43-46

Have resolved this by disabling broadcast and changing the MAVROS_GCS_URL=udp-pb://@:14550 to udp://@:14550. QGC can connect directly to the vehicle using a udp connection and base URL of <vehicle ip>:14555 (default port on remote according to mavros).

Vicon traffic

Second thing identified is that there appears to be double to vicon traffic compared to expected. Currently we expect that vicon will be broadcast at 30hz (30 packets/s) from the vicon machine to 255.255.255.255, and indeed that can be seen in the image below (the yellow line). In the image, at about half way I switch vicon on and you see a step in overall traffic (black line approx 60packets/s).

Screenshot from 2022-08-17 15-49-05

The question is where that extra 30 packets come from as they are being sent from 192.168.10.10 -> 192.168.10.255. See the logs on the left. On further inspection they look like identification packets from the cameras - plotting bytes we see that vicon requires a 32ish kiloByte/s stream.

Screenshot from 2022-08-17 18-15-28

Multiple vehicles x2

Good news is that RTTs from multiple vehicles does stabilise between 1.5ms - 5ms with occasional spikes.

mhl787156 commented 2 years ago

Network Bandwidth

Further to the previous comment, I have been testing system bandwidths and have identified that we are not close to hitting any limits. Testing using iperf3over the network between devices.

Raspberry Pi network bandwidth looks to be capped between 45 to 55 Mbits/s.

image

Wired connection is confirmed to be 1Gbit/s

image

Also have verified router settings and there appear to be no multicast related settings to change

After some reading I have identified that low data rates appear to be detrimental to Multicast. Therefore I have enabled higher data rates. Multicast will however use the lowest enabled datarate. For future, lots of people recommend Multicast-To-Unicast support on Wifi Routers.

image