Closed jbhoffman613 closed 4 years ago
On startup master_sync
node waits for topic /master_discovery/changes
with type MasterState
is availabe. The publisher of this topic is created by master_discovery
. Only if this topic was found the service /master_sync/get_sync_info
will be created. At this point it is an expected behavior of master_sync
.
The question is why master_discovery
does not create the expected topic!
Can you check the output of master_discovery
, please.
master_discovery
should also advertises following topics:
$rostopic list | grep master_discovery
/master_discovery/changes
/master_discovery/linkstats
You can also run master_discovery
with
rosrun master_discovery_fkie master_discovery _log_level:=DEBUG
to see debug output if needed (Note: start master_discovery
twice if you see no debug output after first run)
Thank you so much for the response! I have shared the output of the master_discovery and the the topic list. Is there any other debug info I can share with you?
Master discover is advertising these topics (screenshot):
And here is the output of the master_discovery when we run it:
(TEXT VERSION): tian@fetch32:~$ rosrun master_discovery_fkie master_discovery _mcast_group:=224.0.0.251 _log_level:=DEBUG [DEBUG] [WallTime: 1559919730.388528] init_node, name[/master_discovery], pid[3208] [DEBUG] [WallTime: 1559919730.388708] binding to 0.0.0.0 0 [DEBUG] [WallTime: 1559919730.388885] bound to 0.0.0.0 50463 [DEBUG] [WallTime: 1559919730.389143] ... service URL is rosrpc://192.168.0.113:50463 [DEBUG] [WallTime: 1559919730.389271] [/master_discovery/get_loggers]: new Service instance [DEBUG] [WallTime: 1559919730.390337] ... service URL is rosrpc://192.168.0.113:50463 [DEBUG] [WallTime: 1559919730.390468] [/master_discovery/set_logger_level]: new Service instance [INFO] [WallTime: 1559919730.394991] ROS Master URI: http://192.168.0.113:11311
[INFO] [WallTime: 1559919730.401620] Robot hosts: []
[INFO] [WallTime: 1559919730.401706] Approx. mininum avg. network load: 1.36 bytes/s
[INFO] [WallTime: 1559919730.405354] Start RPC-XML Server at ('0.0.0.0', 11611)
[INFO] [WallTime: 1559919730.405604] Subscribe to parameter /roslaunch/uris
[INFO] [WallTime: 1559919730.408792] + Bind to specified unicast socket @(192.168.0.113:11511)
[DEBUG] [WallTime: 1559919730.408961] Ucast bind to: (192.168.0.113:11511)
[DEBUG] [WallTime: 1559919730.409156] mgroup: 224.0.0.251
[DEBUG] [WallTime: 1559919730.409265] interface : 192.168.0.113
[DEBUG] [WallTime: 1559919730.409372] inet: (2, 1, 6, '', ('192.168.0.113', 0))
[INFO] [WallTime: 1559919730.409533] Create multicast socket at ('224.0.0.251', 11511)
[DEBUG] [WallTime: 1559919730.410580] node[/master_discovery, http://192.168.0.113:39861/] entering spin(), pid[3208]
[DEBUG] [WallTime: 1559919730.559197] Send current state to group 224.0.0.251:11511
[DEBUG] [WallTime: 1559919730.627435] Received a NEW heartbeat from ('localhost', 11511) via MULTICAST socket
[INFO] [WallTime: 1559919730.627656] Detected master discovery: http://localhost:11611
[DEBUG] [WallTime: 1559919730.728058] Get additional connection info from http://localhost:11611
[DEBUG] [WallTime: 1559919730.729164] Got [1559919730.510572910, http://fetch32:11311/, fetch32, /master_discovery] from http://192.168.0.113:11611/
[INFO] [WallTime: 1559919730.729554] Added master with ROS_MASTER_URI=http://fetch32:11311/
[DEBUG] [WallTime: 1559919730.729792] ... service URL is rosrpc://192.168.0.113:50463
[DEBUG] [WallTime: 1559919730.729938] [/master_discovery/list_masters]: new Service instance
[DEBUG] [WallTime: 1559919730.730995] ... service URL is rosrpc://192.168.0.113:50463
[DEBUG] [WallTime: 1559919730.731236] [/master_discovery/refresh]: new Service instance
[DEBUG] [WallTime: 1559919730.741510] Received a NEW heartbeat from ('192.168.0.110', 11511) via MULTICAST socket
[INFO] [WallTime: 1559919730.741702] Detected master discovery: http://192.168.0.110:11611
[DEBUG] [WallTime: 1559919730.742142] Received a multicast request for a state update from 192.168.0.110
[DEBUG] [WallTime: 1559919730.742349] Send current state to group 224.0.0.251:11511
[DEBUG] [WallTime: 1559919730.757811] Received a heartbeat from ('192.168.0.110', 11511) via MULTICAST socket
[DEBUG] [WallTime: 1559919730.789484] Received a heartbeat from ('localhost', 11511) via MULTICAST socket
[DEBUG] [WallTime: 1559919730.842181] Get additional connection info from http://192.168.0.110:11611
[DEBUG] [WallTime: 1559919730.850891] Got [1559919682.869565964, http://192.168.0.110:11311/, 192.168.0.110, /master_discovery] from http://192.168.0.110:11611/
[INFO] [WallTime: 1559919730.851155] Added master with ROS_MASTER_URI=http://192.168.0.110:11311/
[DEBUG] [WallTime: 1559919731.410899] Send current state to group 224.0.0.251:11511
[DEBUG] [WallTime: 1559919731.411220] Send requests while init 1/3
[DEBUG] [WallTime: 1559919731.411503] Skipped send request to mcast group 224.0.0.251:11511. Last send was 0.67sec ago, allowed 1.00
[DEBUG] [WallTime: 1559919731.411731] Set timer to send heartbeat in 50.00 sec
[DEBUG] [WallTime: 1559919731.453897] Received a heartbeat from ('localhost', 11511) via MULTICAST socket
[DEBUG] [WallTime: 1559919731.577225] Send current state to group 224.0.0.251:11511
[DEBUG] [WallTime: 1559919731.618539] Received a heartbeat from ('localhost', 11511) via MULTICAST socket
[DEBUG] [WallTime: 1559919731.832573] Received a heartbeat from ('192.168.0.110', 11511) via MULTICAST socket
[DEBUG] [WallTime: 1559919747.892098] Send current state to group 224.0.0.251:11511
[DEBUG] [WallTime: 1559919747.951971] Received a heartbeat from ('localhost', 11511) via MULTICAST socket
[DEBUG] [WallTime: 1559919751.990338] Send current state to group 224.0.0.251:11511
[DEBUG] [WallTime: 1559919752.017024] Received a heartbeat from ('localhost', 11511) via MULTICAST socket
[DEBUG] [WallTime: 1559919757.104603] Send current state to group 224.0.0.251:11511
[DEBUG] [WallTime: 1559919757.131462] Received a heartbeat from ('localhost', 11511) via MULTICAST socket
[DEBUG] [WallTime: 1559919779.541315] Send current state to group 224.0.0.251:11511
[DEBUG] [WallTime: 1559919779.597963] Received a heartbeat from ('localhost', 11511) via MULTICAST socket
[DEBUG] [WallTime: 1559919780.762059] Received a heartbeat from ('192.168.0.110', 11511) via MULTICAST socket
[DEBUG] [WallTime: 1559919780.847693] Received a multicast request for a state update from 192.168.0.110
[DEBUG] [WallTime: 1559919780.848006] Send current state to group 224.0.0.251:11511
[DEBUG] [WallTime: 1559919780.848307] Received a heartbeat from ('192.168.0.110', 11511) via MULTICAST socket
[DEBUG] [WallTime: 1559919780.912002] Received a heartbeat from ('localhost', 11511) via MULTICAST socket
[DEBUG] [WallTime: 1559919781.412166] Send current state to group 224.0.0.251:11511
[DEBUG] [WallTime: 1559919781.412550] Send requests while init 2/3
[DEBUG] [WallTime: 1559919781.412897] Skipped send request to mcast group 224.0.0.251:11511. Last send was 0.56sec ago, allowed 1.00
[DEBUG] [WallTime: 1559919781.413181] Set timer to send heartbeat in 50.00 sec
[DEBUG] [WallTime: 1559919781.476384] Received a heartbeat from ('localhost', 11511) via MULTICAST socket
[DEBUG] [WallTime: 1559919830.744708] Received a heartbeat from ('192.168.0.110', 11511) via MULTICAST socket
[DEBUG] [WallTime: 1559919831.413637] Send current state to group 224.0.0.251:11511
[DEBUG] [WallTime: 1559919831.413910] Send requests while init 3/3
[DEBUG] [WallTime: 1559919831.414028] Send request to mcast group 224.0.0.251:11511
[DEBUG] [WallTime: 1559919831.414217] Set timer to send heartbeat in 50.00 sec
[DEBUG] [WallTime: 1559919831.459658] Received a heartbeat from ('localhost', 11511) via MULTICAST socket
[DEBUG] [WallTime: 1559919831.459913] Received a multicast request for a state update from localhost
[DEBUG] [WallTime: 1559919831.460191] Send current state to group 224.0.0.251:11511
[DEBUG] [WallTime: 1559919831.467630] Received a heartbeat from ('localhost', 11511) via MULTICAST socket
[DEBUG] [WallTime: 1559919831.484292] Received a heartbeat from ('192.168.0.110', 11511) via MULTICAST socket
[DEBUG] [WallTime: 1559919880.759044] Received a heartbeat from ('192.168.0.110', 11511) via MULTICAST socket
[DEBUG] [WallTime: 1559919881.414789] Send current state to group 224.0.0.251:11511
[DEBUG] [WallTime: 1559919881.415188] Set timer to send heartbeat in 50.00 sec
[DEBUG] [WallTime: 1559919881.474006] Received a heartbeat from ('localhost', 11511) via MULTICAST socket
[DEBUG] [WallTime: 1559919930.743513] Received a heartbeat from ('192.168.0.110', 11511) via MULTICAST socket
hmm... perhaps it is something with host resolving in https://github.com/fkie/multimaster_fkie/blob/indigo-devel/master_discovery_fkie/src/master_discovery_fkie/interface_finder.py#L121
As hotfix (1) you can try to comment out and get_hostname(val) == get_hostname(masteruri)
or (2) replace the line https://github.com/fkie/multimaster_fkie/blob/indigo-devel/master_sync_fkie/src/master_sync_fkie/master_sync.py#L71
by topic_names = [/master_discovery/changes]
If you can fix it by hotfix (1) we can try to find the reason why it fails.
Thank you! We have done hot fix (1) and it has removed the warning message. We also now see that the fetch robot has the service:
$ rosservice list | grep sync
/master_sync/get_loggers
/master_sync/get_sync_info
/master_sync/set_logger_level
Everything looks good except on other machines we see the INFO message: ROS masters obtained from '/masters_discovery/list_masters': ['fetch', 'machine1', 'machine2']
. On the fetch robot, we never see this message in the sync node. Is that message just for aesthetics or does that tell us that there is another problem?
Here is a screenshot. The middle screen is the sync node of the fetch robot:
sorry, there is a second line with get_hostname(val) == get_hostname(masteruri)
in
https://github.com/fkie/multimaster_fkie/blob/indigo-devel/master_discovery_fkie/src/master_discovery_fkie/interface_finder.py#L209
I wonder why ROS Master and ROS node report URI with different hostname. It would be great if you add these two lines
print("nodeuri: %s" % val)
print("masteruri: %s" % masteruri)
before get_hostname(val) == get_hostname(masteruri)
.
Thank you for help debugging!
I have added the print statement in the two spots, and I have commented out get_hostname(val) == get_hostname(masteruri)
in https://github.com/fkie/multimaster_fkie/blob/indigo-devel/master_discovery_fkie/src/master_discovery_fkie/interface_finder.py#L209
Here is the new code in /src/master_discovery_fkie/interface_finder.py
. The code in lines 117 to 124 is:
# get the URI of the publisher node
for n in l:
code, msg, val = master.lookupNode(rospy.get_name(), n)
# only local publisher will be tacked
print("nodeuri: %s" % val)
print("masteruri: %s" % masteruri)
if code == 1: # and get_hostname(val) == get_hostname(masteruri):
result.append(topic)
The code in lines 206 to 214 is:
# search for a service
for srv, providers in srvs:
if srv.endswith(name):
# only local service will be tacked
code, msg, val = master.lookupService(rospy.get_name(), srv)
print("nodeuri: %s" % val)
print("masteruri: %s" % masteruri)
if code == 1: # and get_hostname(val) == get_hostname(masteruri):
result.append(srv)](url)
Here is a screenshot of the sync node output:
I hope this helps you debug it! Thank you!
I can not reproduce this behavior on my machines. So I added a new parameter check_host
to the master branch. I can be a work around for all who uses the latest multimaster version with same problems.
Describe the bug After running the master_discovery node on the fetch robot, we expected the master_sync node to automatically see that master_discovery node is running, but it does not. We are also running the master_discovery node and the master_sync on the workstation and other robots, and the master_sync node discovers the master_discovery node without problem.
To Reproduce We are following the official technical guide to set up the system. Also, when we bring up the master_discovery node other machines, the fetch robots says
ROS masters obtained from /master_discovery/list_masters
. The master_discovery node on the workstation says the same thing.When we then start the sync node (as stated in the guide), it says the
Master_discovery node appear not to running. Wait for topic with type 'MasterState.
It continually throws this error. This does not happen on any other robot/workstation.Expected behavior We expect the sync node to automatically discover the master_discovery node like the workstation.
Screenshots
catkin workspace (please complete the following information):
Additional context On the workstation and on the other bots, we noticed that they have the service
/master_sync/get_sync_info
The fetch robot does not have this service. (I don't know if this is relevant, but I thought I may add it.)It is related to: https://github.com/fkie/multimaster_fkie