eclipse-zenoh / zenoh-plugin-ros2dds

A Zenoh plug-in for ROS2 with a DDS RMW. See https://discourse.ros.org/t/ros-2-alternative-middleware-report/ for the advantages of using this plugin over other DDS RMW implementations.
https://zenoh.io
Other
126 stars 29 forks source link

[Bug] Closing transport with multiple bridges/subscribers connected #314

Open miltzhaw opened 3 weeks ago

miltzhaw commented 3 weeks ago

Describe the bug

I have a robot with the latest release of the zenoh bridge connecting as a client to a Zenoh router in a Kubernetes cluster. On the same k8s cluster I have a container with a zenoh-bridge that connects to the same Zenoh router as a client and can see these topics and for instance use Rviz with nav2 to visualize and move the robot.

When I start in another container another zenoh-bridge connecting to the same router and visualizing the topics I get the following error and the zenoh-bridge on the robot will stop working.

ERROR ThreadId(19) zenoh_transport::unicast::universal::tx: Unable to push non droppable network message to acac40b9496508dc4cf792ca876954fc. Closing transport!

OBS: I also observed this behavior without starting a second container, but with a second subscriber to a topic for instance with ros2 topic echo and rviz already running. However, in this case, it occurs inconsistently, so sometimes it works other times not.

Two warning messages I noticed that seemed also to be related are the following ones:

WARN net-0 ThreadId(10) zenoh::net::runtime::orchestrator: Unable to connect to tcp/ip:port! Received a close message (reason MAX_LINKS) in response to an OpenSyn on: TransportLinkUnicast { link: Link { src: tcp/ip:port, dst: tcp/ip:port, mtu: 64995, is_reliable: true, is_streamed: true }, config: TransportLinkUnicastConfig { direction: Outbound, batch: BatchConfig { mtu: 64995, is_streamed: true, is_compression: false }, priorities: None, reliability: None } } at /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/zenoh-transport-1.0.0/src/unicast/establishment/open.rs:472.

WARN net-0 ThreadId(09) zenoh_plugin_ros2dds::route_service_cli: Route Service Client (ROS:/summit/lifecycle_manager_navigation/is_active <-> Zenoh:bot1/summit/lifecycle_manager_navigation/is_active): received error as reply for (2c2adf3057843613,26): ReplyError { payload: ZBytes(ZBuf { slices: [[54, 69, 6d, 65, 6f, 75, 74]] }), encoding: Encoding(Encoding { id: 0, schema: None }) }

To reproduce

  1. Start ros2-dds-bridge on the robot with almost default configuration to connect to a Zenoh router in client mode.
  2. Start ros2-dds-bridge on the container in the cloud with almost default configuration to connect to a Zenoh router in client mode.
  3. Start rviz e.g., with nav2 on the cloud container.
  4. Repeat steps 2 and 3 on another container in the cluster

System info

Robot with ROS2 humble container and ros2ddsbridge stable latest version 1.0.0 Container on Kubernetes cluster with ROS2 humble and ros2ddsbridge stable latest version 1.0.0 Zenoh router stable latest version 1.0.0

miltzhaw commented 3 weeks ago

An update on this issue. I tested previous versions to identify in which release this error appears and it seems to appear in 1.0.0-beta.4. The 1.0.0-beta.3 does not have this specific error I described above.

miltzhaw commented 3 days ago

As a follow-up comment. The issue seems to be related to the reduced resources Zenoh router had allocated as a container on Kubernetes. What are the minimum memory and cpu resources required? That would be helpful to know.