gazebosim / gz-transport

Transport library for component communication based on publication/subscription and service calls.
https://gazebosim.org
Apache License 2.0
29 stars 36 forks source link

Fix topic/service list inconsistency #415

Closed caguero closed 1 year ago

caguero commented 1 year ago

🦟 Bug fix

Fixes #400

Summary

Discovery periodically sends topic/service heartbeats to keep all the topics/services advertised by a given node alive in the network. These messages are received by other discovery processes that keep the topics/services up to date or remove them in the absence of heartbeat updates after some time.

In order to get the list of topics/services, gz needs to initialize the discovery. This means that discovery should wait some time to receive all the topics/services updates to create the topic/service list. The original intent was to wait two cycles (2 * heartbeatInterval) before we consider the discovery initialized. However the current code only waits one cycle, as we were using the amount of heartbeats sent as the condition. That's essentially one cycle, which wasn't the original intent. The simple fix wait two real heartbeat cycles now.

How to test it?

You can reproduce the issue without this patch. First, run the simulation:

gz sim shapes.sdf

Then, check that the gz service -l is consistent.

watch -g -n 3 'gz service -l | wc -l'

This command will exit when the output is different than the last command. It should fail after a few tries. Then, repeat the testing procedure with this patch. The watch call should never finish, meaning the output of gz service -l is consistent.

Note: My plan is to merge this patch from Citadel, and then, forward port it.

Checklist

Note to maintainers: Remember to use Squash-Merge and edit the commit message to match the pull request summary while retaining Signed-off-by messages.

caguero commented 1 year ago

changes look good. CI failures seem unrelated. Are the flakes or do they need to be addressed?

Probably unrelated with this PR but I'm compiling Citadel on Docker as I don't fully understand why that test is failing.

codecov[bot] commented 1 year ago

Codecov Report

Merging #415 (c277564) into ign-transport8 (cb1c95b) will decrease coverage by 0.05%. The diff coverage is 88.10%.

:exclamation: Current head c277564 differs from pull request most recent head 5b671de. Consider uploading reports for the commit 5b671de to get more accurate results

@@                Coverage Diff                 @@
##           ign-transport8     #415      +/-   ##
==================================================
- Coverage           83.61%   83.56%   -0.05%     
==================================================
  Files                  51       51              
  Lines                5035     5039       +4     
==================================================
+ Hits                 4210     4211       +1     
- Misses                825      828       +3     
Impacted Files Coverage Δ
include/gz/transport/Packet.hh 0.00% <0.00%> (ø)
...og/include/gz/transport/log/detail/QueryOptions.hh 100.00% <ø> (ø)
log/src/Batch.cc 89.28% <ø> (ø)
log/src/Descriptor.cc 89.65% <ø> (ø)
log/src/Descriptor.hh 100.00% <ø> (ø)
log/src/Log.cc 78.72% <ø> (ø)
log/src/Message.cc 100.00% <ø> (ø)
log/src/MsgIter.cc 80.43% <ø> (ø)
log/src/QualifiedTime.cc 97.81% <ø> (ø)
log/src/QueryOptions.cc 100.00% <ø> (ø)
... and 40 more
caguero commented 1 year ago

I couldn't reproduce the issue with that test locally, so I've disabled it in Citadel.