gazebosim / gz-transport

Transport library for component communication based on publication/subscription and service calls.
https://gazebosim.org
Apache License 2.0
29 stars 36 forks source link

Node::Advertise doesn't fail when advertising a duplicate service #217

Open peci1 opened 3 years ago

peci1 commented 3 years ago

I'd expect that there can only be one "responser" for each advertised service. But that's not true.

Taking code from the examples:

$ ./responser_oneway &
[1] 10281
$ sleep 5 # to be sure the first responser gets properly advertised
$ ./responser_oneway &
[2] 10313

$ ign service -s /oneway -i
Service providers [Address, Request Message Type, Response Message Type]:
  tcp://172.17.0.1:39721, ignition.msgs.StringMsg, ignition.msgs.Empty
  tcp://172.17.0.1:43201, ignition.msgs.StringMsg, ignition.msgs.Empty

$ ign service -s /oneway --reqtype ignition.msgs.StringMsg --reptype ignition.msgs.Empty --timeout 2000 --req 'data: "Hello"'
Request received: [Hello]
Service call timed out

So it seems the discovery service is perfectly fine with having multiple providers of a service, but the client gets confused (the console only has one Request received... line, which means only one of the service responders got called). The timeout is weird.

I think it's wrong that the discovery service registers the other responser (but given the distributed nature of the system, maybe it' s inevitable?). However, Node::Advertise should definitely check for duplicate services before returning true.

caguero commented 3 years ago

Hi @peci1 , this is a design decision. The discovery layer knows about all the services because in case one of them becomes unavailable, the requester node can use the next service up and running. If you have multiple instances of the same service, there's no guarantee of which one will provide the response. You can only assume that one of them will provide the response.

peci1 commented 3 years ago

Ah, then the docs are lacking, because I was searching really thoroughly and haven't found anything about this design decision. I searched both tutorials and source code of the library. I think such non-trivial decision needs to be described in the docs of Node::Advertise() then...

And the timeout shows to be unrelated to the two running responsers - it also happens with just one. It is probably caused by #97.