fkie / multimaster_fkie

ROS stack with FKIE packages for multi-robot (discovering, synchronizing and management GUI)
BSD 3-Clause "New" or "Revised" License
271 stars 108 forks source link

master_sync gets stuck in timestamp loop #90

Closed Cartoonman closed 5 years ago

Cartoonman commented 5 years ago

I have two devices running multimaster_fkie with sync and discovery. local host is slave to the remote host's ntp server, and both are time synced. I'm using the zeroconf discovery method.

I seem to have stumbled across a strange race condition where the remote host gets to a state where it doesn't require a request to the local host, but the local host gets stuck in a request loop to the remote host.

Specifically lines 433-437 in sync_thread.py is where the crux of the problem occurs. The timestamps timestamp_remote and stamp_local do not change with each request, so it just keeps sending RPC requests over the network.

My guess is that there seems to be a problem in the way master state and timestamps are updated with zeroconf, since I have seen that if I specifically monitor zeroconf/changes through the node_manager, the remote host's timestamp in sync_thread does get updated to the latest one after a request (and stays constant thereafter), but the local host's timestamp remains the same (as reported by the remote host).

I've been able to somewhat reproduce the issue by staging a launch file through default config launch on the local host and monitoring master_sync's output, as well as over Wireshark.

atiderko commented 5 years ago

I did not tested zeroconf method for long time... Do you have the debug output of master_sync, especially with lines: 433 and 435 in sync_thread.py

You can add /master_sync/log_level parameter with DEBUG value and restart master_sync to see all debug messages since start: $ rosparam set /master_sync/log_level DEBUG

I will try to reproduce the problem!

Cartoonman commented 5 years ago

Yes, here's a snippet. I have two remote devices but one of them does not have the sync node on.


[DEBUG] [1544204739.649943]: init_node, name[/master_sync], pid[29102]
[DEBUG] [1544204739.651098]: binding to 0.0.0.0 0
[DEBUG] [1544204739.651697]: bound to 0.0.0.0 34495
[DEBUG] [1544204739.652648]: ... service URL is rosrpc://localmachine:34495
[DEBUG] [1544204739.653185]: [/master_sync/get_loggers]: new Service instance
[DEBUG] [1544204739.656799]: ... service URL is rosrpc://localmachine:34495
[DEBUG] [1544204739.657328]: [/master_sync/set_logger_level]: new Service instance
]2;master_sync[INFO] [1544204739.672933]: ignore_hosts: []
[INFO] [1544204739.676031]: sync_hosts: []
[INFO] [1544204739.681605]: sync_topics_on_demand: False
[INFO] [1544204739.684349]: resync_on_reconnect: True
[INFO] [1544204739.687261]: resync_on_reconnect_timeout: 0
[INFO] [1544204739.699292]: listen for updates on /zeroconf/changes
[DEBUG] [1544204739.705786]: ... service URL is rosrpc://localmachine:34495
[DEBUG] [1544204739.706325]: [/master_sync/get_sync_info]: new Service instance
[DEBUG] [1544204739.725828]: connecting to localmachine 43593
[DEBUG] [1544204739.728413]: connecting to localmachine 43593
[INFO] [1544204739.731472]: ROS masters obtained from '/zeroconf/list_masters': ['localmachine', 'remote-host-2', 'remote-host-1']
[DEBUG] [1544204739.732141]: SyncThread[remote-host-2]: create this sync thread
[INFO] [1544204739.739170]: [remote-host-2] ignore_nodes: ['/master_sync', '/node_manager', '/rosout', '/zeroconf']
[INFO] [1544204739.744759]: [remote-host-2] sync_nodes: []
[INFO] [1544204739.749773]: [remote-host-2] ignore_topics: ['/rosout', '/rosout_agg']
[INFO] [1544204739.755276]: [remote-host-2] sync_topics: []
[INFO] [1544204739.760367]: [remote-host-2] ignore_services: ['/*get_loggers', '/*set_logger_level']
[INFO] [1544204739.765872]: [remote-host-2] sync_services: []
[INFO] [1544204739.771228]: [remote-host-2] ignore_type: ['bond/Status']
[INFO] [1544204739.776739]: [remote-host-2] ignore_publishers: []
[INFO] [1544204739.781756]: [remote-host-2] ignore_subscribers: []
[DEBUG] [1544204739.790722]: SyncThread[remote-host-2]: update notify new timestamp(1544194679.901128054), old(0.000000000)
[DEBUG] [1544204739.791761]: SyncThread[remote-host-1]: create this sync thread
[INFO] [1544204739.800109]: [remote-host-1] ignore_nodes: ['/master_sync', '/node_manager', '/rosout', '/zeroconf']
[INFO] [1544204739.805337]: [remote-host-1] sync_nodes: []
[INFO] [1544204739.810586]: [remote-host-1] ignore_topics: ['/rosout', '/rosout_agg']
[INFO] [1544204739.814409]: [remote-host-1] sync_topics: []
[INFO] [1544204739.817172]: [remote-host-1] ignore_services: ['/*get_loggers', '/*set_logger_level']
[INFO] [1544204739.820132]: [remote-host-1] sync_services: []
[INFO] [1544204739.822310]: [remote-host-1] ignore_type: ['bond/Status']
[INFO] [1544204739.824511]: [remote-host-1] ignore_publishers: []
[INFO] [1544204739.826416]: [remote-host-1] ignore_subscribers: []
[DEBUG] [1544204739.829017]: SyncThread[remote-host-1]: update notify new timestamp(1544198997.926506042), old(0.000000000)
[DEBUG] [1544204739.829589]: node[/master_sync, http://localmachine:38591/] entering spin(), pid[29102]
[INFO] [1544204740.294743]: SyncThread[remote-host-1] Requesting remote state from 'http://remote-host-1:11911/'
[INFO] [1544204740.301757]: SyncThread[remote-host-1] Applying remote state...
[DEBUG] [1544204740.305221]: SyncThread[remote-host-1]: current timestamp 1544198997.926506042, local 1544196658.900779963
[DEBUG] [1544204740.306395]: SyncThread[remote-host-1]: invoke next update, remote ts: 1544198997.926506042
[INFO] [1544204740.308146]: SyncThread[remote-host-1] remote state applied.
[INFO] [1544204740.776493]: SyncThread[remote-host-1] Requesting remote state from 'http://remote-host-1:11911/'
[INFO] [1544204740.783588]: SyncThread[remote-host-1] Applying remote state...
[DEBUG] [1544204740.787265]: SyncThread[remote-host-1]: current timestamp 1544198997.926506042, local 1544196658.900779963
[DEBUG] [1544204740.788438]: SyncThread[remote-host-1]: invoke next update, remote ts: 1544198997.926506042
[INFO] [1544204740.789904]: SyncThread[remote-host-1] remote state applied.
[INFO] [1544204741.439328]: SyncThread[remote-host-2] Requesting remote state from 'http://remote-host-2:11911/'
[INFO] [1544204741.444130]: SyncThread[remote-host-2] Applying remote state...
[DEBUG] [1544204741.445528]: SyncThread[remote-host-2]: current timestamp 1544194679.901128054, local 1544194679.901128054
[INFO] [1544204741.445939]: SyncThread[remote-host-2] remote state applied.
[INFO] [1544204741.809589]: SyncThread[remote-host-1] Requesting remote state from 'http://remote-host-1:11911/'
[INFO] [1544204741.816672]: SyncThread[remote-host-1] Applying remote state...
[DEBUG] [1544204741.820567]: SyncThread[remote-host-1]: current timestamp 1544198997.926506042, local 1544196658.900779963
[DEBUG] [1544204741.821615]: SyncThread[remote-host-1]: invoke next update, remote ts: 1544198997.926506042
[INFO] [1544204741.823084]: SyncThread[remote-host-1] remote state applied.
[INFO] [1544204742.950957]: SyncThread[remote-host-1] Requesting remote state from 'http://remote-host-1:11911/'
[INFO] [1544204742.958130]: SyncThread[remote-host-1] Applying remote state...
[DEBUG] [1544204742.962188]: SyncThread[remote-host-1]: current timestamp 1544198997.926506042, local 1544196658.900779963
[DEBUG] [1544204742.963614]: SyncThread[remote-host-1]: invoke next update, remote ts: 1544198997.926506042
[INFO] [1544204742.965301]: SyncThread[remote-host-1] remote state applied.
[INFO] [1544204743.967549]: SyncThread[remote-host-1] Requesting remote state from 'http://remote-host-1:11911/'
[INFO] [1544204743.972471]: SyncThread[remote-host-1] Applying remote state...
[DEBUG] [1544204743.973838]: SyncThread[remote-host-1]: current timestamp 1544198997.926506042, local 1544196658.900779963
[DEBUG] [1544204743.974213]: SyncThread[remote-host-1]: invoke next update, remote ts: 1544198997.926506042
[INFO] [1544204743.974678]: SyncThread[remote-host-1] remote state applied.
[INFO] [1544204744.936028]: SyncThread[remote-host-1] Requesting remote state from 'http://remote-host-1:11911/'
[INFO] [1544204744.943481]: SyncThread[remote-host-1] Applying remote state...
[DEBUG] [1544204744.947471]: SyncThread[remote-host-1]: current timestamp 1544198997.926506042, local 1544196658.900779963
[DEBUG] [1544204744.948562]: SyncThread[remote-host-1]: invoke next update, remote ts: 1544198997.926506042
[INFO] [1544204744.950158]: SyncThread[remote-host-1] remote state applied.
[INFO] [1544204746.668732]: SyncThread[remote-host-1] Requesting remote state from 'http://remote-host-1:11911/'
[INFO] [1544204746.675722]: SyncThread[remote-host-1] Applying remote state...
[DEBUG] [1544204746.679210]: SyncThread[remote-host-1]: current timestamp 1544198997.926506042, local 1544196658.900779963
[DEBUG] [1544204746.680397]: SyncThread[remote-host-1]: invoke next update, remote ts: 1544198997.926506042
[INFO] [1544204746.681249]: SyncThread[remote-host-1] remote state applied.
[INFO] [1544204748.208464]: SyncThread[remote-host-1] Requesting remote state from 'http://remote-host-1:11911/'
[INFO] [1544204748.215949]: SyncThread[remote-host-1] Applying remote state...
[DEBUG] [1544204748.219789]: SyncThread[remote-host-1]: current timestamp 1544198997.926506042, local 1544196658.900779963
[DEBUG] [1544204748.220773]: SyncThread[remote-host-1]: invoke next update, remote ts: 1544198997.926506042
[INFO] [1544204748.222704]: SyncThread[remote-host-1] remote state applied.
[INFO] [1544204749.580450]: SyncThread[remote-host-1] Requesting remote state from 'http://remote-host-1:11911/'
[INFO] [1544204749.587525]: SyncThread[remote-host-1] Applying remote state...
[DEBUG] [1544204749.591136]: SyncThread[remote-host-1]: current timestamp 1544198997.926506042, local 1544196658.900779963
[DEBUG] [1544204749.592172]: SyncThread[remote-host-1]: invoke next update, remote ts: 1544198997.926506042
[INFO] [1544204749.593571]: SyncThread[remote-host-1] remote state applied.
[INFO] [1544204751.449565]: SyncThread[remote-host-1] Requesting remote state from 'http://remote-host-1:11911/'
[INFO] [1544204751.456952]: SyncThread[remote-host-1] Applying remote state...
[DEBUG] [1544204751.460759]: SyncThread[remote-host-1]: current timestamp 1544198997.926506042, local 1544196658.900779963
[DEBUG] [1544204751.462077]: SyncThread[remote-host-1]: invoke next update, remote ts: 1544198997.926506042
[INFO] [1544204751.464479]: SyncThread[remote-host-1] remote state applied.
[INFO] [1544204751.809529]: SyncThread[remote-host-1] Requesting remote state from 'http://remote-host-1:11911/'
[INFO] [1544204751.816471]: SyncThread[remote-host-1] Applying remote state...
[DEBUG] [1544204751.820103]: SyncThread[remote-host-1]: current timestamp 1544198997.926506042, local 1544196658.900779963
[DEBUG] [1544204751.821233]: SyncThread[remote-host-1]: invoke next update, remote ts: 1544198997.926506042
[INFO] [1544204751.822658]: SyncThread[remote-host-1] remote state applied.
[INFO] [1544204752.143654]: SyncThread[remote-host-1] Requesting remote state from 'http://remote-host-1:11911/'
[INFO] [1544204752.150591]: SyncThread[remote-host-1] Applying remote state...
[DEBUG] [1544204752.154075]: SyncThread[remote-host-1]: current timestamp 1544198997.926506042, local 1544196658.900779963
[DEBUG] [1544204752.155255]: SyncThread[remote-host-1]: invoke next update, remote ts: 1544198997.926506042
[INFO] [1544204752.156806]: SyncThread[remote-host-1] remote state applied.
[INFO] [1544204753.602383]: SyncThread[remote-host-1] Requesting remote state from 'http://remote-host-1:11911/'
[INFO] [1544204753.609825]: SyncThread[remote-host-1] Applying remote state...
[DEBUG] [1544204753.613684]: SyncThread[remote-host-1]: current timestamp 1544198997.926506042, local 1544196658.900779963
[DEBUG] [1544204753.614744]: SyncThread[remote-host-1]: invoke next update, remote ts: 1544198997.926506042
[INFO] [1544204753.616193]: SyncThread[remote-host-1] remote state applied.
[INFO] [1544204754.969740]: SyncThread[remote-host-1] Requesting remote state from 'http://remote-host-1:11911/'
[INFO] [1544204754.976392]: SyncThread[remote-host-1] Applying remote state...
[DEBUG] [1544204754.979364]: SyncThread[remote-host-1]: current timestamp 1544198997.926506042, local 1544196658.900779963
[DEBUG] [1544204754.980047]: SyncThread[remote-host-1]: invoke next update, remote ts: 1544198997.926506042
[INFO] [1544204754.980849]: SyncThread[remote-host-1] remote state applied.
[INFO] [1544204755.999198]: SyncThread[remote-host-1] Requesting remote state from 'http://remote-host-1:11911/'
[INFO] [1544204756.006569]: SyncThread[remote-host-1] Applying remote state...
[DEBUG] [1544204756.010164]: SyncThread[remote-host-1]: current timestamp 1544198997.926506042, local 1544196658.900779963
[DEBUG] [1544204756.011159]: SyncThread[remote-host-1]: invoke next update, remote ts: 1544198997.926506042
[INFO] [1544204756.012561]: SyncThread[remote-host-1] remote state applied.
[INFO] [1544204757.168893]: SyncThread[remote-host-1] Requesting remote state from 'http://remote-host-1:11911/'
[INFO] [1544204757.175884]: SyncThread[remote-host-1] Applying remote state...
[DEBUG] [1544204757.179585]: SyncThread[remote-host-1]: current timestamp 1544198997.926506042, local 1544196658.900779963
[DEBUG] [1544204757.180600]: SyncThread[remote-host-1]: invoke next update, remote ts: 1544198997.926506042
[INFO] [1544204757.182012]: SyncThread[remote-host-1] remote state applied.
[INFO] [1544204759.066807]: SyncThread[remote-host-1] Requesting remote state from 'http://remote-host-1:11911/'
[INFO] [1544204759.074215]: SyncThread[remote-host-1] Applying remote state...
[DEBUG] [1544204759.077878]: SyncThread[remote-host-1]: current timestamp 1544198997.926506042, local 1544196658.900779963
[DEBUG] [1544204759.078803]: SyncThread[remote-host-1]: invoke next update, remote ts: 1544198997.926506042
[INFO] [1544204759.080224]: SyncThread[remote-host-1] remote state applied.
[INFO] [1544204760.496236]: SyncThread[remote-host-1] Requesting remote state from 'http://remote-host-1:11911/'
[INFO] [1544204760.503516]: SyncThread[remote-host-1] Applying remote state...
[DEBUG] [1544204760.507143]: SyncThread[remote-host-1]: current timestamp 1544198997.926506042, local 1544196658.900779963
[DEBUG] [1544204760.508191]: SyncThread[remote-host-1]: invoke next update, remote ts: 1544198997.926506042
[INFO] [1544204760.509552]: SyncThread[remote-host-1] remote state applied.
[INFO] [1544204761.314378]: SyncThread[remote-host-1] Requesting remote state from 'http://remote-host-1:11911/'
[INFO] [1544204761.321552]: SyncThread[remote-host-1] Applying remote state...
[DEBUG] [1544204761.325336]: SyncThread[remote-host-1]: current timestamp 1544198997.926506042, local 1544196658.900779963
[DEBUG] [1544204761.326377]: SyncThread[remote-host-1]: invoke next update, remote ts: 1544198997.926506042
[INFO] [1544204761.327792]: SyncThread[remote-host-1] remote state applied.
[INFO] [1544204761.524902]: SyncThread[remote-host-1] Requesting remote state from 'http://remote-host-1:11911/'
[INFO] [1544204761.531846]: SyncThread[remote-host-1] Applying remote state...
[DEBUG] [1544204761.535338]: SyncThread[remote-host-1]: current timestamp 1544198997.926506042, local 1544196658.900779963
[DEBUG] [1544204761.536306]: SyncThread[remote-host-1]: invoke next update, remote ts: 1544198997.926506042
[INFO] [1544204761.537572]: SyncThread[remote-host-1] remote state applied.
[INFO] [1544204763.350769]: SyncThread[remote-host-1] Requesting remote state from 'http://remote-host-1:11911/'
[INFO] [1544204763.358358]: SyncThread[remote-host-1] Applying remote state...
[DEBUG] [1544204763.362072]: SyncThread[remote-host-1]: current timestamp 1544198997.926506042, local 1544196658.900779963
[DEBUG] [1544204763.363138]: SyncThread[remote-host-1]: invoke next update, remote ts: 1544198997.926506042
[INFO] [1544204763.364558]: SyncThread[remote-host-1] remote state applied.
[INFO] [1544204763.449291]: SyncThread[remote-host-1] Requesting remote state from 'http://remote-host-1:11911/'
[INFO] [1544204763.455983]: SyncThread[remote-host-1] Applying remote state...
[DEBUG] [1544204763.459664]: SyncThread[remote-host-1]: current timestamp 1544198997.926506042, local 1544196658.900779963
[DEBUG] [1544204763.460700]: SyncThread[remote-host-1]: invoke next update, remote ts: 1544198997.926506042
[INFO] [1544204763.462259]: SyncThread[remote-host-1] remote state applied.
[INFO] [1544204763.686774]: SyncThread[remote-host-1] Requesting remote state from 'http://remote-host-1:11911/'
[INFO] [1544204763.693593]: SyncThread[remote-host-1] Applying remote state...
[DEBUG] [1544204763.697242]: SyncThread[remote-host-1]: current timestamp 1544198997.926506042, local 1544196658.900779963
[DEBUG] [1544204763.698498]: SyncThread[remote-host-1]: invoke next update, remote ts: 1544198997.926506042
[INFO] [1544204763.699971]: SyncThread[remote-host-1] remote state applied.
[INFO] [1544204764.551961]: SyncThread[remote-host-1] Requesting remote state from 'http://remote-host-1:11911/'
[INFO] [1544204764.559039]: SyncThread[remote-host-1] Applying remote state...
[DEBUG] [1544204764.562733]: SyncThread[remote-host-1]: current timestamp 1544198997.926506042, local 1544196658.900779963
[DEBUG] [1544204764.563791]: SyncThread[remote-host-1]: invoke next update, remote ts: 1544198997.926506042
[INFO] [1544204764.565095]: SyncThread[remote-host-1] remote state applied.
[INFO] [1544204765.147432]: SyncThread[remote-host-1] Requesting remote state from 'http://remote-host-1:11911/'
[INFO] [1544204765.154738]: SyncThread[remote-host-1] Applying remote state...
[DEBUG] [1544204765.158325]: SyncThread[remote-host-1]: current timestamp 1544198997.926506042, local 1544196658.900779963
[DEBUG] [1544204765.159350]: SyncThread[remote-host-1]: invoke next update, remote ts: 1544198997.926506042
[INFO] [1544204765.160596]: SyncThread[remote-host-1] remote state applied.
[INFO] [1544204765.328071]: SyncThread[remote-host-1] Requesting remote state from 'http://remote-host-1:11911/'
[INFO] [1544204765.335167]: SyncThread[remote-host-1] Applying remote state...
[DEBUG] [1544204765.338874]: SyncThread[remote-host-1]: current timestamp 1544198997.926506042, local 1544196658.900779963
[DEBUG] [1544204765.339870]: SyncThread[remote-host-1]: invoke next update, remote ts: 1544198997.926506042
[INFO] [1544204765.341219]: SyncThread[remote-host-1] remote state applied.
[INFO] [1544204766.072632]: SyncThread[remote-host-1] Requesting remote state from 'http://remote-host-1:11911/'
[INFO] [1544204766.079480]: SyncThread[remote-host-1] Applying remote state...
[DEBUG] [1544204766.081684]: SyncThread[remote-host-1]: current timestamp 1544198997.926506042, local 1544196658.900779963
[DEBUG] [1544204766.082218]: SyncThread[remote-host-1]: invoke next update, remote ts: 1544198997.926506042
[INFO] [1544204766.082851]: SyncThread[remote-host-1] remote state applied.
[INFO] [1544204767.465325]: SyncThread[remote-host-1] Requesting remote state from 'http://remote-host-1:11911/'
[INFO] [1544204767.472247]: SyncThread[remote-host-1] Applying remote state...
[DEBUG] [1544204767.475803]: SyncThread[remote-host-1]: current timestamp 1544198997.926506042, local 1544196658.900779963
[DEBUG] [1544204767.476908]: SyncThread[remote-host-1]: invoke next update, remote ts: 1544198997.926506042
[INFO] [1544204767.478322]: SyncThread[remote-host-1] remote state applied.
atiderko commented 5 years ago

I added a fix to avoid request loop. I hope it this solve the problem.

Long time ago I added a new timestamp for changes which do not regard the changes caused by master_sync to avoid recyncs. I added it only to master_discovery :-/

Thank you for reporting this issue!

Cartoonman commented 5 years ago

Hotfix works! Thanks for your very fast reply and fix!

Cheers!