eclipse-cyclonedds / cyclonedds

Eclipse Cyclone DDS project
https://projects.eclipse.org/projects/iot.cyclonedds
Other
885 stars 361 forks source link

tev: ddsi_udp_conn_write failed with retcode -3 #2024

Open tonynajjar opened 5 months ago

tonynajjar commented 5 months ago

Error I get

1717770916.976931 [0]        tev: ddsi_udp_conn_write to udp/192.168.0.51:7602 failed with retcode -3
1717770916.976943 [0]        tev: ddsi_udp_conn_write to udp/192.168.0.51:7604 failed with retcode -3
1717770916.976960 [0]        tev: ddsi_udp_conn_write to udp/192.168.0.51:7606 failed with retcode -3
1717770916.976976 [0]        tev: ddsi_udp_conn_write to udp/192.168.0.51:7608 failed with retcode -3
1717770916.976995 [0]        tev: ddsi_udp_conn_write to udp/192.168.0.51:7610 failed with retcode -3
1717770916.977010 [0]        tev: ddsi_udp_conn_write to udp/192.168.0.51:7612 failed with retcode -3
1717770916.977022 [0]        tev: ddsi_udp_conn_write to udp/192.168.0.51:7614 failed with retcode -3
....

Details

I'm running ROS humble using cyclonedds on 2 computers.

The computers connected via ethernet. The ethernet interface of Computer 1 is called eth0 and the one of Computer 2 is called enp89s0. I can ping them and I can even receive udp packets when running echo "Test message" | nc -u -w1 192.168.0.51 7648 so I know the ports are not blocked. One more detail, the ROS nodes are running in docker containers but they share the host's network so that shouldn't be a source of problem as far as I know.

Computer 1

<?xml version="1.0" encoding="UTF-8" ?>
<CycloneDDS xmlns="https://cdds.io/config" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="https://cdds.io/config https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/master/etc/cyclonedds.xsd">
  <Domain id="any">
    <General>
      <Interfaces>
        <NetworkInterface name="lo"/>
        <NetworkInterface name="eth0"/>
      </Interfaces>
      <AllowMulticast>false</AllowMulticast>
    </General>
    <Discovery>
      <ParticipantIndex>auto</ParticipantIndex>
      <Peers>
        <Peer Address="192.168.0.50"/>
        <Peer Address="localhost"/>
      </Peers>
      <MaxAutoParticipantIndex>120</MaxAutoParticipantIndex>
    </Discovery>
  </Domain>
</CycloneDDS>

Computer 2

<?xml version="1.0" encoding="UTF-8" ?>
<CycloneDDS xmlns="https://cdds.io/config" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="https://cdds.io/config https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/master/etc/cyclonedds.xsd">
  <Domain id="any">
    <General>
      <Interfaces>
        <NetworkInterface name="lo"/>
        <NetworkInterface name="enp89s0"/>
      </Interfaces>
      <AllowMulticast>false</AllowMulticast>
    </General>
    <Discovery>
      <ParticipantIndex>auto</ParticipantIndex>
      <Peers>
        <Peer Address="192.168.0.51"/>
        <Peer Address="localhost"/>
      </Peers>
      <MaxAutoParticipantIndex>120</MaxAutoParticipantIndex>
    </Discovery>
  </Domain>
</CycloneDDS>

I'd really appreciate the help, I'm trying to switch from FastDDS as it's causing issues and I wouldn't like to give up on Cyclone as well. Thanks

tonynajjar commented 5 months ago

Looking a bit around, could it be related to having multiple network interfaces?

https://github.com/ros2/rmw_cyclonedds/issues/455 https://github.com/ros2/rmw_cyclonedds/issues/459 https://github.com/eclipse-cyclonedds/cyclonedds/issues/1915 https://github.com/eclipse-cyclonedds/cyclonedds/issues/1190 https://github.com/eclipse-cyclonedds/cyclonedds/issues/1422

videh25 commented 2 months ago

Facing same error on Ubuntu 20.04 and ROS Foxy.

btbrucethompson commented 2 months ago

I am having same issue but this is with a C program that uses cyclonedds 0.11 (ie not a ROS2 program). arm Ubuntu 20.04. It passes this xml to the dds_create_domain() function:

<General><AllowMulticast>false</AllowMulticast><Interfaces><NetworkInterface name="lo"/><NetworkInterface name="eth0"/></Interfaces></General><Domain><Discovery><ParticipantIndex>auto</ParticipantIndex><Peers><Peer Address="127.0.0.1"/><Peer Address="192.168.1.114"/></Peers></Discovery></Domain>

I also tried <Peers AddLocalhost="true"> and did not put the loopback address in the <Peer> list. I get the same error.

Cyclone outputs a bunch of lines that look like these:

1725464190.844856 [0] tev: ddsi_udp_conn_write to udp/192.168.1.114:7414 failed with retcode -3 1725464190.844917 [0] tev: ddsi_udp_conn_write to udp/192.168.1.114:7416 failed with retcode -3 1725464190.844963 [0] tev: ddsi_udp_conn_write to udp/192.168.1.114:7418 failed with retcode -3

eboasson commented 2 weeks ago

I generally try to ignore only the errors from sendmsg that I know to be harmless. I did something on [September 6[(https://github.com/eclipse-cyclonedds/cyclonedds/commit/695c3b2e45ef4dbf1b82a70ab493828a92a05e34) where I added EADDRNOTAVAIL as a harmless error. I probably is that; I just wonder whether it was triggered subconsciously by this issue or a true independent discovery ...

That commit is part of PR https://github.com/eclipse-cyclonedds/cyclonedds/pull/2086 which I finally merged last week. That is of not much use to ROS 2 because that's not trivially back ported to 0.10.x and I still need to do a bit of work for ROS 2 and master, but if you could give Cyclone master are a try if you also merge https://github.com/ros2/rmw_cyclonedds/pull/501 into the Cyclone RMW layer.)

@btbrucethompson for you, I think current master is worth a try, because my merge was after your comment ...