LCAS / limo_ros2

limo ros2 packages
2 stars 9 forks source link

[BUG]: DDS Multi Client Limited Bandwidth #20

Open jondave opened 8 months ago

jondave commented 8 months ago

Description of the bug

When two or more clients are both using rviz using zenoh bridge then topic messages drop.

Change from <ParticipantIndex>auto</ParticipantIndex> to <ParticipantIndex>120</ParticipantIndex>. https://github.com/LCAS/limo_ros2/blob/humble/.devcontainer/setup-router.sh#L28C27-L28C31

@cooperj, @GPrathap

Steps To Reproduce

Multiple students connecting to same robot.

Additional Information

No response

marc-hanheide commented 8 months ago

I'm not sure this is the core of the problem.

We run Zenoh in peer to peer mode. To have multiple clients we'd have to run it with a Zenoh server, I think. Good to know but not a bug.

But please feel free to open a PR with the suggested change if it helped.

marc-hanheide commented 8 months ago

I need to ask what 120 achieves here. Documentation at https://github.com/eclipse-cyclonedds/cyclonedds/blob/master/docs%2Fmanual%2Foptions.md#cycloneddsdomaindiscoverymaxautoparticipantindex

suggests this is not a valid option. How did you come across this solution?

jondave commented 8 months ago

@GPrathap said he uses this solution in another ROS2 project.

marc-hanheide commented 8 months ago

interesting 🤔 @GPrathap want to comment?

GPrathap commented 8 months ago

Hi @marc-hanheide, I am not entirely sure this resolves the issue, however, better to try and see. When I run the Rviz some of the messages' frequencies are dropping. After setting this value, it was fixed.

Also, https://docs.ros.org/en/galactic/How-To-Guides/DDS-tuning.html net.ipv4.ipfrag_time asked to reduce, if you reduce some of the messages are not visible in the rviz. I keep it at default value, e.g., 30.

GPrathap commented 8 months ago

Maybe be this: https://robotics.stackexchange.com/questions/25075/cyclonedds-can-see-topic-when-using-ros2-topic-list-but-only-can-echo-some-of

marc-hanheide commented 8 months ago

yes, setting a fixed participant ID can theoretically help with performance, but 120 seems to be the first illegal value. So not sure why that's chosen.

marc-hanheide commented 8 months ago

I don't think we need to tune the kernel network params here as we are not using DDS over the network really. DDS is only on the local loopback device, as only zenoh is used externally?

GPrathap commented 8 months ago

Even in a single machine, you have to set sudo sysctl -w net.core.rmem_max=2147483647 , have a look here https://github.com/SteveMacenski/spatio_temporal_voxel_layer/issues/257

marc-hanheide commented 8 months ago

I toyed around with the most simple setting CYCLONEDDS settings in https://github.com/LCAS/teaching/pull/44 It appears to make it much more performant for me, but only started investigating

marc-hanheide commented 7 months ago

I think https://github.com/LCAS/teaching/pull/49 is actually the correct way to address this. We actually want multicast inside the container. It will be contained to the container anyway.