PX4 / PX4-Autopilot

PX4 Autopilot Software
https://px4.io
BSD 3-Clause "New" or "Revised" License
8.29k stars 13.43k forks source link

[Bug] Unreliable connection with operating vehicle over microdds #22903

Open Jaeyoung-Lim opened 6 months ago

Jaeyoung-Lim commented 6 months ago

Describe the bug

We have tried using microdds for autonomous operations, which is controlled over Rviz from the ground.

While ROS 2 seems to work on single host, our experience with using ROS 2 over multiple hosts, especially over cellular connections have been extremely disappointing.

In summary, the problems we seem to have are largely two parts, and are both related to reliability of the network.

We have concluded that using ROS 2 as an operation link over the network is in fact unusable, and reimplemented our stack on ROS1.

To Reproduce

  1. Run ROS2 on a companion computer onboard the drone
  2. Try to connect to any ros topics on the companion computer over a cellular network.

Expected behavior

While some messages get lost, behavior should be reflected on what the QoS is supposed to.

"Reliable" QoS is not at all reliable and from experience even less reliable than sensor QoS

Screenshot / Media

No response

Flight Log

Software Version

PX4 main, v1.14

Flight controller

Pixhawk 4

Vehicle type

None

How are the different components wired up (including port information)

No response

Additional context

No response

dirksavage88 commented 6 months ago

@Jaeyoung-Lim is this an issue with microdds connection or ROS 2? If images are being sent I am guessing the ROS 2 traffic is being impacted by the lossy network, and PX4 microdds client is in the loop (but the micro dds connection should hopefully not be the cause).

dirksavage88 commented 6 months ago

Also for reference: https://docs.ros.org/en/galactic/Tutorials/Demos/Quality-of-Service.html

I actually need to try reliability:=best_effort

Dealing with losses in wifi networks is a pain, and I often can't record with ros2 bag if I am far from the router. Usually the image messages get dropped. I am able to record odometry and pose though.

XXLiu-HNU commented 1 week ago

We are currently trying to perform formation flying of drone clusters through WiFi and ros2. When the number of drones is 3, the program runs normally and the drones form a formation as expected. But when the number of drones increased to 6, the communication seemed to become very slow, which manifested as the inability to use ros2 topic echo and even the inability to connect to the remote desktop (NoMachine).

Here are the tests we conducted (it has no effect for the time being)

  1. We try to reduce the number of topics posted, when we only post the topics in the picture below, the problem still exists:

image

  1. We reduced the baud rate of the serial port from 2000000 to 921600;
  2. Try changing to a better performing WiFi router;
  3. sudo sysctl net.ipv4.ipfrag_high_thresh=134217728 # (128 MB)

Maybe we can deal with this problem by setting up DDS, but we haven't found a suitable method yet. At this time, refer to the link: [Unconfigured DDS considered harmful to Networks]

At the same time, I am confused. Since ROS2 specifically adopts a new communication method, it is called to better implement the distributed framework, but in actual situations it often does not work. This is a bit funny.

811f3ccc09ded6fddee5fe690cac82d