gazebosim / gz-transport

Transport library for component communication based on publication/subscription and service calls.
https://gazebosim.org
Apache License 2.0
29 stars 36 forks source link

Allow programmatic configuration of unicast relays. #498

Closed mbeards closed 1 month ago

mbeards commented 2 months ago

This change allows users to configure relays from code without having to setenv(GZ_RELAY).

🎉 New feature

Summary

This change allows users to configure relays from code without having to setenv(GZ_RELAY).

Supersedes https://github.com/gazebosim/gz-transport/pull/497

Test it

This can be tested by starting a Node across a boundary where multicast isn't enabled, and setting the relay IP via NodeOptions.

Checklist

Note to maintainers: Remember to use Squash-Merge and edit the commit message to match the pull request summary while retaining Signed-off-by messages.

mbeards commented 2 months ago

Friendly ping @caguero

mbeards commented 2 months ago

By making Discovery::AddRelayAddress() public, I think we now have risk for potential race conditions, as this->relayAddrs can be modified from a user thread (via Node) and the discovery thread can read the same variable via SendUnicast().

Should I add a new mutex to guard relayAddrs or just use the existing Discovery::mutex?

caguero commented 2 months ago

By making Discovery::AddRelayAddress() public, I think we now have risk for potential race conditions, as this->relayAddrs can be modified from a user thread (via Node) and the discovery thread can read the same variable via SendUnicast().

Should I add a new mutex to guard relayAddrs or just use the existing Discovery::mutex?

I think you can reuse Discovery::mutex.

mbeards commented 2 months ago

By making Discovery::AddRelayAddress() public, I think we now have risk for potential race conditions, as this->relayAddrs can be modified from a user thread (via Node) and the discovery thread can read the same variable via SendUnicast().

Should I add a new mutex to guard relayAddrs or just use the existing Discovery::mutex?

I think you can reuse Discovery::mutex.

Is there a performance concern to worry about re: serializing the user thread + discovery thread in the presence of unicast relays? Every unicast write would need to acquire that mutex.

mbeards commented 1 month ago

Thoughts on this test failure https://build.osrfoundation.org/job/gz_transport-pr-win/99/ ? I'm having a hard time figuring out how this PR would trigger behavior change on those tests.

caguero commented 1 month ago

By making Discovery::AddRelayAddress() public, I think we now have risk for potential race conditions, as this->relayAddrs can be modified from a user thread (via Node) and the discovery thread can read the same variable via SendUnicast().

Should I add a new mutex to guard relayAddrs or just use the existing Discovery::mutex?

I think you can reuse Discovery::mutex.

Is there a performance concern to worry about re: serializing the user thread + discovery thread in the presence of unicast relays? Every unicast write would need to acquire that mutex.

By making Discovery::AddRelayAddress() public, I think we now have risk for potential race conditions, as this->relayAddrs can be modified from a user thread (via Node) and the discovery thread can read the same variable via SendUnicast().

Should I add a new mutex to guard relayAddrs or just use the existing Discovery::mutex?

I think you can reuse Discovery::mutex.

Is there a performance concern to worry about re: serializing the user thread + discovery thread in the presence of unicast relays? Every unicast write would need to acquire that mutex.

I'd expect a bit of impact if unicast is used because the mutex will be acquired when calling DispacthDiscoveryMsg(). It shouldn't make a big difference.

caguero commented 1 month ago

Thoughts on this test failure https://build.osrfoundation.org/job/gz_transport-pr-win/99/ ? I'm having a hard time figuring out how this PR would trigger behavior change on those tests.

ReplayTest looks unrelated.

mbeards commented 1 month ago

Friendly ping @caguero .

mbeards commented 1 month ago

Minor comments.

Done.

mbeards commented 1 month ago

Not sure why the ABI checker doesn't catch it but I think that changing the access right (from private to public in our case) in Discovery::AddRelayAddress() breaks ABI.

I think the solution is to create a new public version of the function that calls the existing one.

Let's just target main?

mjcarroll commented 1 month ago

Not sure why the ABI checker doesn't catch it but I think that changing the access right (from private to public in our case) in Discovery::AddRelayAddress() breaks ABI.

I think it's actually fine on Linux, it's only an issue with MSVC where the symbol mangling includes the access rights

(Edit: but targeting main is easy, too)

caguero commented 1 month ago

@osrf-jenkins run tests

caguero commented 1 month ago

Not sure why this gz_transport-ci-pr_any-jammy-amd64 job is stuck.

azeey commented 1 month ago

Our CI was recently changed so that main branches only running tests on Noble. But the branch protection for gz-transport wasn't updated. Should be good now.

azeey commented 1 month ago

We do need to update the GitHub workflow to use Noble . @caguero would you mind doing that when you get a chance?

caguero commented 1 month ago

We do need to update the GitHub workflow to use Noble . @caguero would you mind doing that when you get a chance?

Thanks @scpeters https://github.com/gazebosim/gz-transport/pull/504