Open BenChung opened 5 months ago
Okay, I figured out a slice of the problem. The issue is that the initial peers discovery method is first used to handshake (successfully) between the two DDS Router instances at which point the server's connection-addresses locator is used for communication instead of the initial peers domain. I'd really prefer to have the initial peers value be used as the locator after discovery rather than the discovered locator since the server may be reachable through a variety of interfaces (for example, it may be available on different IPs inside of the subnet as well as on an externally-facing IP visible to the wider internet).
Hi @BenChung ,
I am not exactly sure what your use case is, and so why are you using this configuration setup. However, I'm gonna point to a few things I find odd and hopefully that might shed some light on the matter.
discovery-trigger: any
option, this might result in endpoints not properly matching due to QoS incompatibilities. I suggest to use the default value (discovery-trigger: reader
).domain
tag a DNS domain is expected, not an IP. I don't know if this might be generating issues (it could actually be treated as an IP due to implementation details, I'd need to verify).0.0.0.0
IPs in our configurations. It might actually work, but as I said it's not tested from our side. I suggest to benefit from Docker compose DNS service and set domains to be service names. Regards
Hi, and thank you for the help! I was trying the discovery-trigger: any
option as a "sticks against the wall" debugging approach.
The issue that was proximally keeping this from working was the 0.0.0.0
IPs. I'd really like one side of this (call it the "server side") to use 0.0.0.0
or similar IP so that it doesn't have to be aware of the ingress approach. It's available under several different ports, IPs, and domain names in the ultimate configuration, and it would be nice if we didn't have to nail that down to a finite list.
As far as I can tell, what happens right now is that the WAN participant instances with one set to 0.0.0.0
will start communicating under initial peers.... but once discovery(?) information has been exchanged the other side will use the domain or IP provided in the locator provided by the other side. In the case of a 0.0.0.0 IP, this defaults to being the system's interface addresses, which really doesn't work in my setup. What I'd like to do is have the WAN participants continue communicating over the connection (IP/domain and port) as originally specified in the connect-or's configuration. This then allows me to set up the "overall" server to be ignorant of how it's connected to (k8s ingress, direct pod to pod addressing, a proxy, etc).
I can make a more specific bug report or feature request along these lines, but I suspect that what I describe is sufficiently alien to the locator model that it's hard to realize.
I have a test setup of four containers; two are running the image
router-base
derived from the dockerfileand the other two are derived from node-base,
I then orchestrate them using the following Docker compose file:
using configs
config.yaml:
and config2.yaml
If I bring the ensemble up with
docker compose --profile good up
, everything works:but if I bring it up with the other router and client on a different virtual network
netB
usingdocker compose --profile bad up
, then it doesn't work:The
tcpdump
data that's generated shows that in both cases the routers are regularly communicating via TCP in patterns that are very similar. However, they don't appear to be cross-publishing thetalker
messages and thus when on different networks the clients aren't able to communicate.