fkie / multimaster_fkie

ROS stack with FKIE packages for multi-robot (discovering, synchronizing and management GUI)
BSD 3-Clause "New" or "Revised" License
268 stars 108 forks source link

Synchronization of remote nodes? #5

Closed mjschuster closed 11 years ago

mjschuster commented 11 years ago

Hi, it seems that only nodes running on the same machine as the roscore on are synchronized. Am I correct in that interpretation of the lines

def _getNodeUri(self, node, nodes):
    for (nodename, uri, masteruri, pid, local) in nodes:
    if (nodename == node) and local == 'local':
    [...]

in https://github.com/fkie/multimaster_fkie/blob/11a4e85936adf4577e94dd3cedcbc480fb185e4d/master_sync_fkie/src/master_sync_fkie/sync_thread.py#L325-L327 ? Would there be an easy way to synchronize remote nodes, too (i.e. keeping two ros masters, each with a distributed system, in sync)? (or am I misinterpreting the name 'local' here, not having digged through all of the code yet...)

Thank you! Cheers, M. (currently working with revision 11a4e85936adf4577e94dd3cedcbc480fb185e4d)

atiderko commented 11 years ago

Hi,

"only nodes running on the same machine as the roscore on are synchronized" - this is correct!

It was not the primary Idea, but if you want to synchronize remote nodes, try to replace if (nodename == node) and local == 'local': by if (nodename == node) and uri== masteruri:

It's not tested, but I hope it will work ;)

Regards, Alex

(I strongly recommend to update (at least) for the next commit https://github.com/fkie/multimaster_fkie/commit/671451f00d37a34c7c00407052c84994791369b7. There are fixed deadlocks in slow networks and other errors!)

mjschuster commented 11 years ago

Hi Alex, thank you for the quick answer! Is uri the node-uri? Isn't that one always different from the masteruri, at least w.r.t. the port-number (thereby uri == masteruri always returning false?) Might it be possible to leave out the and local == 'local' part altogether or could that result in problem e.g. in setups with more than two robots? Cheers, Martin

P.S.: ok, thanks, I will try to update soon. We are still using ROS Fuerte, do you know if the more recent updates are still compatible to that? [update]: I've just discovered your fuerte-devel branch. That probably answers this question.[/update])

atiderko commented 11 years ago

Hi Martin,

Sorry, I was distracted. Of course it is always running false. You have to change but something more: First, add remote_masteruri as parameter:

  def _getNodeUri(self, node, nodes, remote_masteruri):
    for (nodename, uri, masteruri, pid, local) in nodes:
      if (nodename == node) and remote_masteruri == masteruri:
        # the node was registered originally to another ROS master -> do sync
        if  masteruri != self.localMasteruri:
          return uri
    return None

Second, replace all self._getNodeUri(node, nodeProviders) by self._getNodeUri(node, nodeProviders, remote_masteruri)

This time it should work!

If you leave out and local == 'local', then you could have synchronization problems with more than two ROS Master. If you replace it by remote_masteruri == masteruri you can distinguish whether the node was started with ROS Master or only synchronized.

Regards, Alex

PS: With the services you need to make the same

mjschuster commented 11 years ago

Hi Alex, thank you, that seems to partially work:

When publishing from a remote node, the log info (from master_sync on the other ros master) SyncThread[...] topic advertised [...] SyncThread[...] topic unadvertised: [...] is missing, but the messages on the synchronized topic are being sent/received.

In case of a subscriber on a remote node, the log info SyncThread[...] topic subscribed: [...] SyncThread[...] topic unsubscribed: [...] is also missing, and the first one to three messages on the synchronized topic are oftentimes lost. After that it seems to work.

Maybe somewhere in this hack: https://github.com/fkie/multimaster_fkie/blob/master/master_sync_fkie/src/master_sync_fkie/sync_thread.py#L317-L368 , a update isn't triggered right away?

(so far I just did some quick tests with rostopic pub/echo in a two-master setup, without testing any services):

Cheers, Martin

P.S.: I updated to your fuerte-devel branch before making the changes, which you can have a look at here: https://github.com/mjschuster/multimaster_fkie/commit/d09b4d3dc31ee9c7a4662bbd94424ea3bbb8e17a

atiderko commented 11 years ago

Hi Martin,

I will test the log output again...

The problem with missing messages cannot be solved by synchronization. The problem is, that the publisher publishes his message before the sync_thread register the new publisher or the connection between publisher and subscriber is established. After the registration a publisher (subscriber) you must wait a time, before the connection is established. If you try to send the message while the connection is establishing, the message will be dropped.

You can reproduce it without synchronization by creating a publisher and send a message immediately.

Regards, Alex

atiderko commented 11 years ago

Use parameter sync_remote_nodes to sync remote nodes. Added with 3876b3ed977a4036b81277fb4911d545541bbe16 patch.