GZ->ROS communication doesn't work when using the bridge and gz_server in the same composable container

azeey commented 1 month ago

Part of #544

When the bridge and gz_server nodes are in the same process, GZ->ROS communication doesn't work because of https://github.com/gazebosim/ros_gz/blob/cfc94bc0dcf880296c37920cf527b881efc7cb4a/ros_gz_bridge/src/factory.hpp#L112-L115

which ignores all gz-transport messages published from the same process. This was done to enable bidirectional bridges, so I don't think we can simply remove it.

caguero commented 1 month ago

I thought about this and have a few ideas:

1. Solve it a Transport level: Similar to ROS, add a new IgnoreLocalMessages function to the SubscribeOptions class. This will ignore messages coming from the same node. Note that this will still allow to receive messages from other nodes within the same process. For that, we might need to extend the PublishMsgDetails struct to store the UUID of the sender node. Then, when processing the local handlers before running a local callback, we'll compare the sender UUID with the UUID of the subscriber node (this UUID is already in the SubscriptionHandler class). If they match and the IgnoreLocalMessages option is set, we'll discard this message.

2. Solve it at the bridge with extra support from Transport.

I need to review the bridge code but another idea could be to modify Transport to include the Node UUID of the sender in the message sent. That way, it could be included in the MessageInfo instance when passed to each callback (on the receiver side).

Then, in the bridge, when receiving a message from a callback we could check the Node UUID of the sender. If that matches the UUID of the bridge's publisher we know we should ignore the message to avoid loops.

I'm leaning towards (1) because if doesn't add any extra overhead to the network protocol.

azeey commented 1 month ago

Solve it a Transport level: Similar to ROS, add a new IgnoreLocalMessages function to the SubscribeOptions class. This will ignore messages coming from the same node. Note that this will still allow to receive messages from other nodes

I would also be in favor of (1), but I think we would only need the IgnoreLocalMessages option because it's okay to receive messages from other nodes in the same process. From https://github.com/gazebosim/ros_gz/blob/cfc94bc0dcf880296c37920cf527b881efc7cb4a/ros_gz_bridge/src/bridge_handle.hpp#L97, each "bridge" has a single gz-transport Node, so the double message issue would only occur if we receive a message from the same node.

caguero commented 1 month ago

I think we need a solution where "real" publishers share the process with the bridge, which should be the case where we use composition.

That's the case where the current bridge implementation doesn't work as we're relying only on the isIntraprocess() option. That will discard all messages from "real" publishers. And if we comment it out, we'll generate a loop of messages (or at least a duplication).

That's why I think proposal (1) has a finer control because we'll discriminate the case where the bridge subscriber receives the data publisher from the bridge. I think that's what we want (we should set that ignorelocalMessages to true in the bridge.

Do you agree?

azeey commented 1 month ago

Yes, I agree. I'm saying we don't need the part where you said

For that, we might need to extend the PublishMsgDetails struct to store the UUID of the sender node.

We just need to implement the IgnoreLocalMessages option from (1).

azeey commented 1 month ago

Maybe I'm misunderstanding. I thought in (1) you were stating two solutions. (a) IgnoreLocalMessages that still allows messages from other nodes in the same process. (b) A further modification to disable messages from nodes in the same process. Is that not what you're saying?

caguero commented 1 month ago

Sorry if this was a bit confusing. My proposal in (1) is a single solution and I was describing some implementation details to do it. The ignoreLocalMessages is essentially a flag that you could use on subscription to disable messages published from the same node. Independently of how you set ignoreLocalMessages, messages from other nodes should always be received.

azeey commented 1 month ago

I see. Well, let's go ahead with (1) then since we're both on the same page.

caguero commented 1 month ago

See https://github.com/gazebosim/gz-transport/pull/506 and https://github.com/gazebosim/ros_gz/pull/559

gazebosim / ros_gz

GZ->ROS communication doesn't work when using the bridge and gz_server in the same composable container #555