fkie / multimaster_fkie

ROS stack with FKIE packages for multi-robot (discovering, synchronizing and management GUI)
BSD 3-Clause "New" or "Revised" License
272 stars 106 forks source link

Not all topics are synced #180

Closed QuantuMope closed 1 year ago

QuantuMope commented 2 years ago

Hello,

I have quite a peculiar problem where only specific topics are synced. First, I will explain my unique setup (which is quite similar to the one in PR #115).

I am on a machine A with a robot arm B. Both machines have their respective ROS masters. I cannot access the machine on robot B due to it being a proprietary robot and therefore, must launch multimaster nodes through a terminal in machine A with robot B's ROS_MASTER_URI.

I use two launch files, one for my machine A and another for robot B. Both are launched from machine A though.

Machine A launch file:

<node name="master_discovery" pkg="fkie_master_discovery" type="master_discovery" />
<node name="master_sync" pkg="fkie_master_sync" type="master_sync" output="screen">
        <rosparam param="sync_topics"> ['/s1/joint_states', '/robot/joint_states'] </rosparam>
</node>

Robot B launch file (called on Machine A with export ROS_MASTER_URI=http://robot.local:11311) I use a different rcp_port similar as to what was mentioned in PR #115. I also relay the topic /robot/joint_states to the topic /s1/joint_states as I would like to eventually connect a second robot with the same namespace and I cannot change the namespace myself.

<node name="master_discovery" pkg="fkie_master_discovery" type="master_discovery">
        <rosparam param="rpc_port"> 11613 </rosparam>
</node>
<node pkg="topic_tools" type="relay" name="s1_relay" args="/robot/joint_states /s1/joint_states"/>

The terminal output of machine A is

started roslaunch server http://QuantumMopeSCI:43563/

SUMMARY
========

PARAMETERS
 * /master_sync/sync_topics: ['/s1/joint_state...
 * /rosdistro: melodic
 * /rosversion: 1.14.13

NODES
  /
    master_discovery (fkie_master_discovery/master_discovery)
    master_sync (fkie_master_sync/master_sync)

ROS_MASTER_URI=http://192.168.1.146:11311

process[master_discovery-1]: started with pid [22226]
process[master_sync-2]: started with pid [22227]
[INFO] [1659399736.178650]: ignore_hosts: []
[INFO] [1659399736.185075]: sync_hosts: []
[INFO] [1659399736.192174]: sync_topics_on_demand: False
[INFO] [1659399736.197407]: resync_on_reconnect: True
[INFO] [1659399736.200915]: resync_on_reconnect_timeout: 0
[INFO] [1659399736.203511]: listen for updates on /master_discovery/changes
[INFO] [1659399753.291191]: [robot.local] ignore_nodes: ['/node_manager', '/param_sync', '/master_sync', '/rosout', '/node_manager_daemon', '/zeroconf', '/master_discovery']
[INFO] [1659399753.299708]: [robot.local] sync_nodes: []
[INFO] [1659399753.306485]: [robot.local] ignore_topics: ['/master_sync/*', '/master_discovery/*', '/rosout', '/rosout_agg', '/zeroconf/*']
[INFO] [1659399753.314488]: [robot.local] sync_topics: ['/robot/joint_states', '/s1/joint_states']
[INFO] [1659399753.321949]: [robot.local] ignore_services: ['/*get_loggers', '/master_sync/*', '/master_discovery/*', '/zeroconf/*', '/*set_logger_level', '/node_manager_daemon/*']
[INFO] [1659399753.330428]: [robot.local] sync_services: []
[INFO] [1659399753.335239]: [robot.local] ignore_type: ['fkie_multimaster_msgs/SyncTopicInfo', 'fkie_multimaster_msgs/MasterState', 'fkie_multimaster_msgs/SyncServiceInfo', 'fkie_multimaster_msgs/SyncMasterInfo', 'bond/Status']
[INFO] [1659399753.340280]: [robot.local] ignore_publishers: []
[INFO] [1659399753.343617]: [robot.local] ignore_subscribers: []
[WARN] [1659399753.432426]: Resolved host of ROS_MASTER_URI robot.local=192.168.1.163 and origin discovered IP=192.168.1.146 are different. Fix your network settings and restart master_discovery!
[INFO] [1659399753.865409]: SyncThread[robot.local] Requesting remote state from 'http://192.168.1.146:11613'
[INFO] [1659399753.873816]: SyncThread[robot.local] Applying remote state...
[INFO] [1659399753.886515]: SyncThread[robot.local] Requesting remote md5sums 'http://192.168.1.146:11613'
[INFO] [1659399753.959848]: SyncThread[robot.local] remote state applied.
[INFO] [1659399755.662658]: SyncThread[robot.local] Requesting remote state from 'http://192.168.1.146:11613'
[INFO] [1659399755.703939]: SyncThread[robot.local] Applying remote state...
[INFO] [1659399755.731761]: SyncThread[robot.local] Requesting remote md5sums 'http://192.168.1.146:11613'
[INFO] [1659399755.738407]: SyncThread[robot.local] remote state applied.
[INFO] [1659399766.217683]: ROS masters obtained from '/master_discovery/list_masters': ['quantummopesci', 'robot.local']
[INFO] [1659399796.235996]: ROS masters obtained from '/master_discovery/list_masters': ['quantummopesci', 'robot.local']
[INFO] [1659399826.251965]: ROS masters obtained from '/master_discovery/list_masters': ['quantummopesci', 'robot.local']

The terminal output for robot B on my machine A is

started roslaunch server http://192.168.1.146:42433/

SUMMARY
========

PARAMETERS
 * /master_discovery/rpc_port: 11613
 * /rosdistro: melodic
 * /rosversion: 1.14.13

NODES
  /
    master_discovery (fkie_master_discovery/master_discovery)
    s1_relay (topic_tools/relay)

ROS_MASTER_URI=http://robot.local:11311

process[master_discovery-1]: started with pid [22306]
process[s1_relay-2]: started with pid [22307]

Now, here comes the problem. Essentially, I can see both /s1/joint_states and /robot/joint_states topics in my rostopic list for robot B but my rostopic list for machine A shows the following where only /robot/joint_states is synced.

/diagnostics
/master_discovery/changes
/master_discovery/linkstats
/robot/joint_states
/rosout
/rosout_agg

The only distinguishing factor I can deduct is that /s1/joint_states is a topic that I create with a node from machine A with robot B's ROS_MASTER_URI while /robot/joint_states is published on robot B's hardware. In fact, I can sync any topic that the robot itself publishes on bootup while all topics I manually publish do not sync. I wouldn't think this would cause any issues but maybe there is something wrong with my setup.

Do you have any insights as to what might be causing this problem?

Thanks so much.

atiderko commented 2 years ago

Hi, since the monitoring of the ROS master by the master_discovery causes a lot of network load, a "remote" start of the master_discovery was never intended and tested. In such a case I would start the whole system with the ROS_MASTER_URI of robot B. If you still want to use your configuration, I'll have to look for the bug. However, it may then take some time, since I can not estimate the effort now.

QuantuMope commented 2 years ago

Unfortunately, I will still have to stick to this configuration.

Essentially, I would like to eventually sync two robots (B and C). Both these robots cannot be accessed through their onboard ROS and have identical topic names. Because of this, the only workaround for dual arm collaboration would be to relay all desired topic names to unique names and then sync it to my local ROS on computer A.

The only thing stopping me from achieving a working setup is the relayed topic name not being synced properly while any topic published by the robot itself is (which is not useful since we have topic name clashes between the two robots).

I appreciate any help towards discovering the bug. I understand it will probably be somewhat difficult.

In the meantime, I may be able to circumvent this by using a rather messy setup with two additional intermediary ROS masters. D and E, meant to relay the topics.

robot B --> ROS D (relays B's topics to unique names) --> ROS A <-- ROS E (relays C's topics to unique names) <-- robot C

This will obviously accrue alot of unwanted bandwidth but I'm not sure how else to solve the problem. Please let me know if you any insights into this.

Thank you!

atiderko commented 2 years ago

Hi,

I have now looked at the problem more closely and there is indeed a way to parameterize the master_sync so that it should work without patches. At least it worked in my replayed scenario ^^

I would also suggest reducing the checking of changes on the remote host by the master_discovery.

<node name="master_discovery" pkg="fkie_master_discovery" type="master_discovery" >
        <rosparam param="rosmaster_hz">0.1</rosparam>
</node>
<node name="master_sync" pkg="fkie_master_sync" type="master_sync" output="screen">
        <rosparam param="check_host">False </rosparam>
        <rosparam param="sync_topics"> ['/s1/joint_states'] </rosparam>
</node>

Sorry, the function of this parameter has slipped my mind!

QuantuMope commented 2 years ago

Unfortunately, the check_host param did not solve the issue.

I am happy to say that I was able to solve the issue by toggling a different param, sync_remote_nodes!

In fact, the description of this param matches perfectly the problem I was having and is set to false as default:

~sync_remote_nodes (boolean, default: False) The nodes which are running not at the same host as the ROS master are not synchronized by default. Use sync_remote_nodes to sync these nodes also.

Setting this to true solved my issue.

Before I close this issue though, I would like to ask another question about syncing two robots B and C (with identical topic names) to a control station A. Although syncing works perfectly fine for a single robot, the moment I sync a second robot, one of the two robots essentially crashes and requires a reboot. From the previous issues I've read, I'm assuming this is because the topic names overlap and even if I set ignore_hosts or sync_topics, the overlapping published topics of B are still viewable to C through A. I have no way of changing the namespace of topics with these two robots. Is there any way I can sync B and C to control station A while B and C are completely unseen by eachother so as to prevent crashing?

Thanks so much for your help again.