ApolloAuto / apollo-platform

Collections of Apollo Platform Software
736 stars 390 forks source link

FastRTPS participantID needs to be set explicitly #53

Open jilinzhou opened 6 years ago

jilinzhou commented 6 years ago

I only come across to this issue on QNX. Basically all running nodes do not see each other (cannot be discovered) because every participant has the same participant ID. The related file is "apollo-platform/ros/ros_comm/roscpp/src/libros/discovery/participant.cpp". Here is the diff of a possible fix:

diff --git a/ros/ros_comm/roscpp/src/libros/discovery/participant.cpp b/ros/ros_comm/roscpp/src/libros/discovery/participant.cpp
index b987738..83a1608 100644
--- a/ros/ros_comm/roscpp/src/libros/discovery/participant.cpp
+++ b/ros/ros_comm/roscpp/src/libros/discovery/participant.cpp
@@ -102,6 +102,8 @@ bool Participant::init(user_callback cb)
     }
   }

+  srand(time(NULL));
+
   eprosima::fastrtps::ParticipantAttributes participant_param;
   participant_param.rtps.defaultSendPort = 50000;
   participant_param.rtps.use_IP6_to_send = false;
@@ -112,6 +114,7 @@ bool Participant::init(user_callback cb)
   participant_param.rtps.builtin.domainId = domain_id;
   participant_param.rtps.builtin.leaseDuration = c_TimeInfinite;
   participant_param.rtps.builtin.leaseDuration_announcementperiod.seconds = 3;
+  participant_param.rtps.participantID = rand % 100 + 1;
   participant_param.rtps.setName(_name.c_str());
sagniknitr commented 6 years ago

Hi @jilinzhou , does this patch solve the QNX discovery problem ?

What if I am running the fast-rtps publisher and subscriber in the same QnX machine ? Do I need to give any parameter so that the pub-sub pair use the local loopback interface to send messages ?

jilinzhou commented 6 years ago

Yes, it solves the participants discovery problem. If on the same machine, there is no need to set any parameters if I remember correctly.

KevinYuk commented 6 years ago

Hi @jilinzhou, I met participants discovery problem in the scenario below:

  1. rosrun pb_msgs_example pb_talker on one PC with ubuntu OS;
  2. rosrun pb_msgs_example pb_listener on another PC in Docker with ubuntu OS;

However, if I use standard ros (aka: centralized ros), it works fines in the scenario above.

My question is: is it possible to configure baidu ros based on FastRTPS and don't do code change you list?

jilinzhou commented 6 years ago

@KevinYuk : the environment variable ROS_DOMAIN_ID has to be the same on both hosts otherwise the participants cannot see other. Just my guess though!

Before launching anything, on host one:

export ROS_DOMAIN_ID=5000

on host two:

export ROS_DOMAIN_ID=5000

then, start roscore and pb_talker on one host (assume all other environment variables are sourced already), and pb_listener on the other host. Either you can try to use "rostopic list/echo blah/blah" tools to verify everything goes as expected.

KevinYuk commented 6 years ago

Hi @jilinzhou, Thanks for your comments. I use the same ROS_DOMAIN_ID on both hosts.

I modify the code as you list on both nodes (they are in different hosts), however, still cannot do participants discovery. By print the result, I found the participant_param.rtps.participantID is different now, but still don't work.

BTW, can you show me the completed diag CLI about rostopic to shoot this issue?

Thanks a lot.

For your reference, below is my config:

phy A host:
declare -x ROSLISP_PACKAGE_DIRECTORIES=""
declare -x ROS_DISTRO="indigo"
declare -x ROS_DOMAIN_ID="5000"
declare -x ROS_ETC_DIR="/home/carla/baidu_ros/fix_ros/apollo-platform/ros/install/ros_x86_64/etc/ros"
declare -x ROS_HOSTNAME="10.239.161.138"
declare -x ROS_MASTER_URI="http://10.239.161.138:11311"
declare -x ROS_PACKAGE_PATH="/home/carla/baidu_ros/fix_ros/apollo-platform/ros/install/ros_x86_64/share:/home/carla/baidu_ros/fix_ros/apollo-platform/ros/install/ros_x86_64/stacks:/home/carla/baidu_ros/ros_x86_64_installed/share:/home/carla/baidu_ros/ros_x86_64_installed/stacks:/home/tmp/ros/share:/home/tmp/ros/stacks"
declare -x ROS_ROOT="/home/carla/baidu_ros/fix_ros/apollo-platform/ros/install/ros_x86_64/share/ros"
 phy B host:
declare -x ROSLISP_PACKAGE_DIRECTORIES=""
declare -x ROS_DISTRO="indigo"
declare -x ROS_DOMAIN_ID="5000"
declare -x ROS_ETC_DIR="/apollo/baidu_ros_in_docker/fix_ros_in_docker/copy_from_other_build/install/ros_x86_64/etc/ros"
declare -x ROS_HOSTNAME="172.18.0.1"
declare -x ROS_IP="10.239.12.129"
declare -x ROS_MASTER_URI="http://10.239.161.138:11311"
declare -x ROS_PACKAGE_PATH="/apollo/baidu_ros_in_docker/fix_ros_in_docker/copy_from_other_build/install/ros_x86_64/share:/apollo/baidu_ros_in_docker/fix_ros_in_docker/copy_from_other_build/install/ros_x86_64/stacks:/home/tmp/ros/share:/home/tmp/ros/stacks"
declare -x ROS_ROOT="/apollo/baidu_ros_in_docker/fix_ros_in_docker/copy_from_other_build/install/ros_x86_64/share/ros"

By using configuration above and standard ros (not baidu ros), the whole system works fine. But by using configuration above and baidu ros, it failed.

jilinzhou commented 6 years ago

screenshot from 2018-07-19 13-48-49

I do not think it is related to participant id issue as I came cross before. One way to debug is to build libfastrtps in debug version with -DCMAKE_BUILD_TYPE=Debug and enable its internal log with -DINTERNAL_DEBUG=ON. Then in test program main.cc, set verbosity level: Log::SetVerbosity(LOG::Kind::Info);

By default, all fastrtps internal log messages go to console.

Good luck.

KevinYuk commented 6 years ago

@jilinzhou Thanks. Are your talker and listener are on two different hosts? Thanks a lot.

jilinzhou commented 6 years ago

Yes. One is running inside a Linux Apollo docker and the other is on a Qnx target.

KevinYuk commented 6 years ago

Hmmm. Interesting, still does't work. In the host where your Apollo docker resident, is it necessary to install baid ros both on physical machine and docker at the same time? Or just install baidu ros in docker? Thanks.

Note: my two host machines: one is Ubuntu 14.04 trusty, another is Ubuntu 16.04 xenial. I only install baidu ros in Ubuntu 16.04 docker.

jilinzhou commented 6 years ago

You have to run baidu ROS on both hosts.

darbee commented 6 years ago

I only come across to this issue on QNX. Basically all running nodes do not see each other (cannot be discovered) because every participant has the same participant ID. The related file is "apollo-platform/ros/ros_comm/roscpp/src/libros/discovery/participant.cpp". Here is the diff of a possible fix:

diff --git a/ros/ros_comm/roscpp/src/libros/discovery/participant.cpp b/ros/ros_comm/roscpp/src/libros/discovery/participant.cpp
index b987738..83a1608 100644
--- a/ros/ros_comm/roscpp/src/libros/discovery/participant.cpp
+++ b/ros/ros_comm/roscpp/src/libros/discovery/participant.cpp
@@ -102,6 +102,8 @@ bool Participant::init(user_callback cb)
     }
   }

+  srand(time(NULL));
+
   eprosima::fastrtps::ParticipantAttributes participant_param;
   participant_param.rtps.defaultSendPort = 50000;
   participant_param.rtps.use_IP6_to_send = false;
@@ -112,6 +114,7 @@ bool Participant::init(user_callback cb)
   participant_param.rtps.builtin.domainId = domain_id;
   participant_param.rtps.builtin.leaseDuration = c_TimeInfinite;
   participant_param.rtps.builtin.leaseDuration_announcementperiod.seconds = 3;
+  participant_param.rtps.participantID = rand % 100 + 1;
   participant_param.rtps.setName(_name.c_str());

hi @jilinzhou how can you cross compile FastRTPS for qnx. can you give me some help? thanks a lot