fkie / multimaster_fkie

ROS stack with FKIE packages for multi-robot (discovering, synchronizing and management GUI)
BSD 3-Clause "New" or "Revised" License
268 stars 108 forks source link

Name or service not known with zeroconf and multiple networks #36

Closed zacwitte closed 5 years ago

zacwitte commented 8 years ago

I'm trying to use the zeroconf master discovery and I'm getting the error you see below on one of the two masters. I have the ROS_MASTER_URI on both masters set to http://localhost:11311. zac-nuc-2 (where the error is occurring) is connected to multiple networks. Ethernet in the office (192.168.1.x) and a separate wifi network that's private to the robot (192.168.0.x). When I unplug from the office network I do not get this error.

The only way I've been able to get multimaster_fkie working with my setup is to manually specify the ROS_MASTER_URI and ROS_IP so they use the IP addresses of the adapters on the same network. Is there any way to make master_discovery node smarter about this? It's a less robust solution because the IP addresses of the computers may change.

zac@zac-nuc-2:~$ roslaunch mbot_bringup multimaster.launch 
... logging to /home/zac/.ros/log/6db37f2e-2ce5-11e6-9e94-b8aeed747bfd/roslaunch-zac-nuc-2-32361.log
Checking log directory for disk usage. This may take awhile.
Press Ctrl-C to interrupt
Done checking log file disk usage. Usage is <1GB.

started roslaunch server http://zac-nuc-2:46305/

SUMMARY
========

PARAMETERS
 * /master_discovery/mcast_group: 224.0.0.1
 * /rosdistro: indigo
 * /rosversion: 1.11.19

NODES
  /
    master_discovery (master_discovery_fkie/zeroconf)
    master_sync (master_sync_fkie/master_sync)

auto-starting new master
process[master]: started with pid [32373]
ROS_MASTER_URI=http://localhost:11311

setting /run_id to 6db37f2e-2ce5-11e6-9e94-b8aeed747bfd
process[rosout-1]: started with pid [32386]
started core service [/rosout]
process[master_discovery-2]: started with pid [32389]
process[master_sync-3]: started with pid [32397]
[INFO] [WallTime: 1465327458.935863] ignore_hosts: []
[INFO] [WallTime: 1465327458.936477] ROS Master URI: http://localhost:11311
[INFO] [WallTime: 1465327458.936834] sync_hosts: []
[INFO] [WallTime: 1465327458.937800] sync_topics_on_demand: False
[WARN] [WallTime: 1465327458.940387] Master_discovery node appear not to running. Wait for topic with type 'MasterState.
[INFO] [WallTime: 1465327458.940863] Start RPC-XML Server at ('0.0.0.0', 11911)
[INFO] [WallTime: 1465327458.941151] Subscribe to parameter `/roslaunch/uris`
[INFO] [WallTime: 1465327458.950919] Zeroconf server now running.
[INFO] [WallTime: 1465327459.805123] zac-nuc-2 is now online
[INFO] [WallTime: 1465327459.945732] listen for updates on /master_discovery/changes
[INFO] [WallTime: 1465327459.948507] Update ROS master list...
[INFO] [WallTime: 1465327459.950773] service 'list_masters' found on /master_discovery/list_masters
[INFO] [WallTime: 1465327463.576900] marble-nuc-5 is now online
[INFO] [WallTime: 1465327463.579243] ignore_nodes: ['/*node_manager', '/*master_sync*', '/rosout', '/*zeroconf', '/*master_discovery*']
[INFO] [WallTime: 1465327463.581518] sync_nodes: []
[INFO] [WallTime: 1465327463.583131] ignore_topics: ['/rosout', '/rosout_agg']
[INFO] [WallTime: 1465327463.584923] sync_topics: []
[INFO] [WallTime: 1465327463.586602] ignore_services: ['/*get_loggers', '/*set_logger_level']
[INFO] [WallTime: 1465327463.588909] sync_services: []
[INFO] [WallTime: 1465327463.592739] ignore_type: ['bond/Status']
[INFO] [WallTime: 1465327463.596651] ignore_publishers: []
[INFO] [WallTime: 1465327463.598533] ignore_subscribers: []
[INFO] [WallTime: 1465327463.599912] do_not_sync: []
[ERROR] [WallTime: 1465327464.524674] SyncThread[marble-nuc-5] ERROR: Traceback (most recent call last):
  File "/opt/ros/indigo/lib/python2.7/dist-packages/master_sync_fkie/sync_thread.py", line 237, in _request_remote_state
    remote_state = remote_monitor.masterInfoFiltered(self._filter.to_list())
  File "/usr/lib/python2.7/xmlrpclib.py", line 1233, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1587, in __request
    verbose=self.__verbose
  File "/usr/lib/python2.7/xmlrpclib.py", line 1273, in request
    return self.single_request(host, handler, request_body, verbose)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1301, in single_request
    self.send_content(h, request_body)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1448, in send_content
    connection.endheaders(request_body)
  File "/usr/lib/python2.7/httplib.py", line 975, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python2.7/httplib.py", line 835, in _send_output
    self.send(msg)
  File "/usr/lib/python2.7/httplib.py", line 797, in send
    self.connect()
  File "/usr/lib/python2.7/httplib.py", line 778, in connect
    self.timeout, self.source_address)
  File "/usr/lib/python2.7/socket.py", line 553, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
gaierror: [Errno -2] Name or service not known
marble@marble-nuc-5:~$ roslaunch mbot_bringup multimaster.launch 
... logging to /home/marble/.ros/log/753e1a06-2ce5-11e6-97ef-b8aeed7d7c01/roslaunch-marble-nuc-5-26259.log
Checking log directory for disk usage. This may take awhile.
Press Ctrl-C to interrupt
WARNING: disk usage in log directory [/home/marble/.ros/log] is over 1GB.
It's recommended that you use the 'rosclean' command.

started roslaunch server http://marble-nuc-5:34820/

SUMMARY
PARAMETERS
 * /master_discovery/mcast_group: 224.0.0.1
 * /rosdistro: indigo
 * /rosversion: 1.11.19

NODES
  /
    master_discovery (master_discovery_fkie/zeroconf)
    master_sync (master_sync_fkie/master_sync)

auto-starting new master
process[master]: started with pid [26271]
ROS_MASTER_URI=http://localhost:11311

setting /run_id to 753e1a06-2ce5-11e6-97ef-b8aeed7d7c01
process[rosout-1]: started with pid [26284]
started core service [/rosout]
process[master_discovery-2]: started with pid [26288]
process[master_sync-3]: started with pid [26293]
[INFO] [WallTime: 1465327471.380536] ROS Master URI: http://localhost:11311
[INFO] [WallTime: 1465327471.382693] ignore_hosts: []
[INFO] [WallTime: 1465327471.384258] sync_hosts: []
[INFO] [WallTime: 1465327471.385789] sync_topics_on_demand: False
[INFO] [WallTime: 1465327471.387859] Start RPC-XML Server at ('0.0.0.0', 11911)
[INFO] [WallTime: 1465327471.388208] Subscribe to parameter `/roslaunch/uris`
[WARN] [WallTime: 1465327471.388624] Master_discovery node appear not to running. Wait for topic with type 'MasterState.
[INFO] [WallTime: 1465327471.396428] Zeroconf server now running.
[INFO] [WallTime: 1465327471.397530] zac-nuc-2 is now online
[INFO] [WallTime: 1465327472.278244] marble-nuc-5 is now online
[INFO] [WallTime: 1465327472.393340] listen for updates on /master_discovery/changes
[INFO] [WallTime: 1465327472.397299] Update ROS master list...
[INFO] [WallTime: 1465327472.406435] service 'list_masters' found on /master_discovery/list_masters
[INFO] [WallTime: 1465327472.412642] ignore_nodes: ['/*node_manager', '/*master_sync*', '/rosout', '/*zeroconf', '/*master_discovery*']
[INFO] [WallTime: 1465327472.414675] sync_nodes: []
[INFO] [WallTime: 1465327472.416285] ignore_topics: ['/rosout', '/rosout_agg']
[INFO] [WallTime: 1465327472.417981] sync_topics: []
[INFO] [WallTime: 1465327472.419486] ignore_services: ['/*get_loggers', '/*set_logger_level']
[INFO] [WallTime: 1465327472.421269] sync_services: []
[INFO] [WallTime: 1465327472.422929] ignore_type: ['bond/Status']
[INFO] [WallTime: 1465327472.424495] ignore_publishers: []
[INFO] [WallTime: 1465327472.426071] ignore_subscribers: []
[INFO] [WallTime: 1465327472.427660] do_not_sync: []
atiderko commented 8 years ago

We have the same problem with multiple networks and we search already for a smart solution. Currently you can use

rosrun node_manager_fkie remote_nm.py --package master_discovery_fkie --node_type master_discovery --node_name /master_discovery __name:=master_discovery --masteruri http://HOSTNAME:11311

and

ssh USER@REMOTE_HOST rosrun node_manager_fkie remote_nm.py --package master_discovery_fkie --node_type master_discovery --node_name /master_discovery __name:=master_discovery --masteruri http://REMOTE_HOST:11311

This script starts also the roscore with given masteruri. So you don't need to change your ROS_MASTER_URI and ROS_IP. (This script cannot be started by a launch file, because roslaunch starts the roscore before) BUT, the HOSTNAME's should be resolved to the right IP. If not the topics between two hosts will not be connected by roscore


You can also use node_manager instead of shell scripts to start your multimaster-network.

AlexisTM commented 6 years ago

"I have the ROS_MASTER_URI on both masters set to http://localhost:11311"

This is the problem, just set your ROS_MASTER_URI to an addressable address.

atiderko commented 5 years ago

Please reopen if problem still exists