fkie / multimaster_fkie

ROS stack with FKIE packages for multi-robot (discovering, synchronizing and management GUI)
BSD 3-Clause "New" or "Revised" License
267 stars 107 forks source link

Publisher using remote roscore #127

Closed dseifert closed 4 years ago

dseifert commented 4 years ago

I have a particular setup that is causing me headache. I am trying to figure out whether the behavior I see is an issue in multimaster, ROS or whether it's an unsupported use case.

Basically, my setup consists of two computers A and B (forming one robot) that use multimaster to get certain topics over a local network. I now want to use rviz on my laptop (C) to see sensor data, etc. As I am using some plugins in rviz that need data from rosparams, it is easiest to connect rviz to the rosmaster on B (export ROS_MASTER_URI=...). This is working perfectly fine. However, one of the rviz plugins is also publishing data which is consumed on B. A and B both publish on that same topic (it's a kind of status/log system). Now, when I restart multimaster (either on A or B), the publisher on C gets notified that B no longer subscribes and stops sending data. If I restart the publisher, everything is fine. Unfortunately restarting the publisher means restarting rviz, which is quite annoying ;-)

Also, I never see the publisher from C listed as a publisher on A.

To make it hopefully easier to understand/reproduce, here's a MWE (if needed, I can upload a zip with the launch files and the listener package)

  1. A and B run roscore, master_discovery_fkie, master_sync with sync_topics:=[/test]
  2. B subscribes to /test (using C++ node "data_listener", which is effectively the ROS Tutorial subscriber)
  3. B publishes on /test using rostopic pub /test ...
  4. C runs export ROS_MASTER_URI=http://ip_of_b:11311 rostopic pub /test ...
    • I see the output of both publishers on B.
  5. I restart either master_discovery_fkie and master_sync on A or on B (doesn't matter which)
    • Output of C's publisher vanishes on B
    • Reason: when enabling debug, I see a publisher update where C's publisher is missing, causing the subscriber to stop listening for it
      [DEBUG] [1590427799.126129693]: /data_listener: Received update for topic [/test] (2 publishers)
      [DEBUG] [1590427799.126192631]: /data_listener: Publisher update for [/test]: http://B:35467/, http://A:46341/,  already have these connections: http://B:35467/, http://C:46543/, 
      [DEBUG] [1590427799.126225323]: /data_listener: Disconnecting from publisher [/rostopic_22035_1590427744189] of topic [/test] at [http://C:46543/]

      This only happens if the subscriber is written in C++. If it is written in Python, it works as expected (hence it is not reproducable with rostopic echo /test).

Questions:

atiderko commented 4 years ago

In general the multimaster is designed to use it on each roscore. Therefore the 'remote' nodes connected to roscore from different hosts are ignored while synchronization. This is the reason why you do not see /test topic on A. This can also lead to the behavior of rviz you described.

There is a parameter ~sync_remote_nodes for master_sync to synchronize also remote nodes. But it is not tested in bigger networks.

I recommend to use master_sync on all roscores. You can also use node_manager to start master_discovery and master_sync.

dseifert commented 4 years ago

Thanks, that fixed it. Somehow I overlooked this option :-/

Using multimaster for rviz is not possible out-of-the-box, as the rosparams are not synced. Esp. if rviz connects to a robot where the specific configuration is not known, it can not reliably setup local params (e.g. robot_description, etc)