Closed akifh closed 2 years ago
Hi @akifh , thank you for using the multimaster.
I have also noticed that the Python nodes/topics are not reconnecting. I'm also not sure if the connection loss is in roslog. To address this issue, master_sync triggers a reconnect if the other host was offline.
To detect short disconnects you can try to set the parameter "heartbeat_hz" of the master_discovery to e.g. 2Hz:
heartbeat_hz:=2
Thanks for the reply.
I tried some other tests. With heartbeat_hz
increased to 2, problem still occurs. Nodes are C++ nodes, so I'm not sure what it is related with.
I may explain the situation better with some referencing and further information,
In Machine A
, I have Node 1
,
In Machine B
, I have Node 2
,
Machines are syncing in unicast mode. Node 1
is publishing a message constantly to Node 2
. At some point, Node 2
says it is not receiving messages anymore from Node 1
.
At this time,
list_masters
on both machines, both masters seem online.rostopic echo
on Machine A
, I get messages from Node 1
.rostopic echo
on Machine B
, I get messages from Node 1
. But Node 2
is still not receiving.Node 1
or Node 2
, it starts working again.I hope this helps to diagnose the issue, if it is related with multimaster. Best
I would next check to see if master_sync does anything before the connection between topics disappears.
Launch each master_sync
in a terminal and set the log level to debug. (You can use _log_level:=DEBUG
parameter and start the sync node twice).
If the connection between the nodes disappears and master_sync
shows no activity, then I would look for the problem in ROS, otherwise you have to look what master_sync
did.
Thanks for the guidance. I've started to believe that this is a ROS issue, not MM, yet I will check thoroughly. When we do our tests, I will notify here.
It was resolved as an issue in roscpp. I'm closing this. Thanks.
Hi @atiderko , thank you for your great work on mm. I have a problem and if possible will ask you how to debug it.
I have two machines synced via multimaster (master_disc. and master_sync) in unicast mode. Everything works as expected for a while. But after a point in time, one machine stops receiving a message in a topic from the other machine. This may be caused with a WiFi reconnection (due to poor signal strength), I'm not sure about that but we lose synchronization.
It only resumes receiving messages once we restart the node publishing to topic. Once it is restarted, it starts receiving again on the other machine.
It seemed to me like a reconnection issue but I'm not sure whether it is related with multimaster (master_sync in particular) or ROS itself.
Can you guide me about debugging of this issue?
Thanks for your help, Best