MaxIV-KitsControls / tango-gateway

A tango gateway server
4 stars 3 forks source link

Bug: wrong network interface used when forwarding CORBA traffic #5

Open stanislaw55 opened 4 years ago

stanislaw55 commented 4 years ago

Hi, I've found (in my opinion) bug in Tango Gateway. When Device Server and client are in different networks (say A and B) and Tango Gateway is in both of them (has network interface with different addresses in both A and B), it can choose wrong interface to forward traffic. From client point of view, it looks like Device Server is completely down despite is running properly.

Client in network B connects to network interface of gateway B. Gateway then chooses to forward traffic to interface for network A.

It's non-deterministic bug and it happens for me when testing with Docker from time to time. I'll try to come up with a fix

vxgmichel commented 4 years ago

Hi @stanislaw55 :)

That's an interesting point. I had a quick look and it turns out it's possible to bind the client socket to a specific interface before connecting:

    reader, writer = await asyncio.open_connection(
        some_host, some_port, local_addr=(local_interface, 0))

This would have to be added here: https://github.com/MaxIV-KitsControls/tango-gateway/blob/5170d47fcaa16e383af7741affcbc2155dbec087/tangogateway/gateway.py#L78-L79

Also, note that the server interface can already be specified using the -b/--bind argument: https://github.com/MaxIV-KitsControls/tango-gateway/blob/befdbec6e9ad97be965ecc608df103ec2f634f7f/tangogateway/cli.py#L35-L38

Accepting both interfaces as command line arguments would be pretty neat! I'm not sure what the terminology should be though:

tango client <-> tango gateway server | tango gateway client <-> tango server

How should the left/right part of this diagram be called:

stanislaw55 commented 4 years ago

Hi @vxgmichel thanks for rapid answer! I've been inspecting the code and I suspect that order of interfaces matter. If first interface in machine/container is the internal one (the one in network with Device Server and real database), then this one is taken into account when binding address - it is clear from code.

I forgot to mention previously that the problem is only related to events using ZMQ. Synchronous communication using CORBA works just fine.

Bind address works as local_addr, but only for the very first connection.

I agree that some kind of proper terminology is a real need for this project. Personally I have something like network A with real TANGO database and only server and network B whit just TANGO clients. Tango Gateway is sitting in between.

I'll try setting local_addr and report back.

stanislaw55 commented 4 years ago

@vxgmichel, the code you pointed to only checks connectivity. The actual connection is made in here (I think) https://github.com/MaxIV-KitsControls/tango-gateway/blob/befdbec6e9ad97be965ecc608df103ec2f634f7f/tangogateway/gateway.py#L151

After investigating code it seems to work like this: During startup, use value of bind_address when forwarding CORBA traffic. For next connections, use get_host_name function ot get hostname. This is passed to asyncio.start_server. Then the result is queried as server.sockets[0].getsockname() and hence we have IP address of first interface.

From what I think, when asyncio.start_server gets hostname, it takes whatever IP address is first. I think, there should be passed explicit value of bind_address

@vxgmichel what do you think about my idea?