epics-extensions / ca-gateway

Channel Access PV Gateway
http://www.aps.anl.gov/epics/extensions/gateway/
Other
17 stars 17 forks source link

cagateway in a Kubernates environment doesnt re-connect ioc running in pods after IP change but with the same DNS names #53

Closed amichelotti closed 3 months ago

amichelotti commented 3 months ago

I use cagateway as a gateway for a Kubernetes installation with several softIOCs inside. Each IOC runs in a pod with a fixed DNS name, but the IP can change if the pod/IOC is restarted. In this environment, everything works fine until a pod/IOC is restarted. In this case, the cagateway never reconnects to the pod, even though its EPICS_CA_ADDR_LIST contains the DNS name of the softioc. I suspect the gateway was written long before the advent of cloud computing and does not expect DNS names to have dynamic IP addresses. Is that correct? is it possibile to handle this new use case that I expect would be more and more widespread. Thanks

ralphlange commented 3 months ago

All of that is part of the CA client library in EPICS Base that the Gateway uses.

Does a "camonitor" command line client reconnect? If it shows the same behavior as the Gateway -> EPICS Base.

amichelotti commented 3 months ago

I didn't try, I'll do and I'll let you know (even if I think I know the answer). Do you think the cagateway could add such feature forcing epics base limitations?

ralphlange commented 3 months ago

I suspect this is all happening inside the CA client library - i.e., not part of the Gateway code. As soon as the CA client library supports it, the Gateway will.

ralphlange commented 3 months ago

I guess your workaround would be to set the ADDR_LIST to the broadcast address of the pod-internal network, to reach all IOC containers on all IP addresses. Why do you need specific settings?

amichelotti commented 3 months ago

Not sure about broadcast on k8s internal network, however my k8s Epics architecture has not only softiocs that stay inside but also iocs that are outside the cluster so I have one (or more) cagateway running inside k8s that has internal and external dns addresses of the iocs and provide to the external with a loadbalancer just one entry. Can I use ADDR_LIST with a mix of broadcast and fixed address?

ralphlange commented 3 months ago

ADDR_LIST can take such a mix, also with ":<port>" suffix on any entry in case IOCs don't use the default ports.

amichelotti commented 3 months ago

Ok, thanks!! I'll try also this solution, even if I would have preferred not having broadcasts (and sincerely I dont know how they propagate in the k8s network).

ralphlange commented 3 months ago

Your request is still ... at least interesting.

Can you open a ticket for it in EPICS Base? As mentioned, the CA client library would need such a change.

ralphlange commented 3 months ago

FWIW: In the EPICS community, it's probably the Diamond Light Source colleagues who have the most experience with running IOCs inside k8s.

amichelotti commented 3 months ago

I know them well! and in parallel I'm asking them (they use host network so they are not affected). We are doing something very near to DLS https://github.com/epics-containers

amichelotti commented 3 months ago

Your request is still ... at least interesting.

Can you open a ticket for it in EPICS Base? As mentioned, the CA client library would need such a change. Yes!, before I'll conduct the checks you recommended to ensure everything functions as expected. This will allow me to report on broadcast functionality on k8s as well. Additionally, I believe it would be beneficial to move this discussion to the technical forum like the tech-talk channel. Dockerization and Kubernetes (k8s) are becoming increasingly important for the future of our community, and I think a focused discussion on these topics would be valuable.

ralphlange commented 3 months ago

Tech-talk sounds like a good idea. Also, presenting your use case and issues at one of the Core Dev meetings/telecons would certainly help push the topic. (Be aware that the Core Devs might have a few concerns about the use of containerized IOCs, but they always welcome and enjoy technical discussions.)

mdavidsaver commented 3 months ago

... does not expect DNS names to have dynamic IP addresses. ...

Correct. CA (and PVA) peers general do DNS lookups once on startup.

CA/PVA are designed for environments where the number of peers is large, and dynamic. Where configuring any client/server with a list of all servers/clients would be impractical. As a general rule, If I find myself tempted to fill out EPICS_CA_ADDR_LIST with IPs or host names, I stop and think if there is some other solution.

I will join Ralph in recommending use of broadcast (or multicast) searching whenever possible.

I use cagateway as a gateway for a Kubernetes installation with several softIOCs inside ...

Could you elaborate? How are you routing traffic to/through your gateway container? Is it somehow dual-homed? If so, what type of network(s) are used?

gilesknap commented 3 months ago

Hi @amichelotti. We have looked at broadcasts inside of the cluster network and it depends upon the network policy controller that your cluster has. There are quite a few https://slashdot.org/software/p/Weave-Net/alternatives

We are currently using weave and that does support broadcasts. We are intending to move to Cillium which also supports broadcasts. I'm pretty sure that flannel does not and that many others do not.

HOWEVER: @ralphlange makes a very interesting comment in the above.

ADDR_LIST can take such a mix, also with ":" suffix on any entry in case IOCs don't use the default ports.

This I did not know. I feel that this is your answer. You can make your IOCs use a nodeport service and each IOC gets its own port - the address just needs to be any of the cluster nodes and it will get routed to the correct node. See https://kubernetes.io/docs/concepts/services-networking/service/#type-nodeport if you have not used these before.

gilesknap commented 3 months ago

Not that I wouldn't love to see CA able to support your use case ! :-)

gilesknap commented 3 months ago

Also, presenting your use case and issues at one of the Core Dev meetings/telecons would certainly help push the topic. (Be aware that the Core Devs might have a few concerns about the use of containerized IOCs, but they always welcome and enjoy technical discussions.)

@ralphlange how could I go about hearing those concerns, please?

amichelotti commented 3 months ago

While your suggestions were insightful and made my initial tests obsolete, I can confirm that DNS resolution is used just at the beginning and not in case of disconnection or server failure and depends on epics base (I use 7.0.8) camonitor is affected in the same way. Including an entry like "10.255.255.255" within the EPICS_CA_ADDR_LIST variable doesn't work with k8s/OKD's network policy framework.

Although I'm new to EPICS, from an external perspective, automatically reloading iptables and check if are still valid based on DNS resolution at least after disconnections or timeouts seems like a valuable feature that could streamline certain workflows. In my workflow the table EPICS_CA_ADDR_LIST is automatically filled as I allocate a new ioc/softioc to run inside/outside k8s.

amichelotti commented 3 months ago

CA (and PVA) peers general do DNS lookups once on startup

Hi @ralphlange @mdavidsaver at last I opened ticket: https://github.com/epics-base/epics-base/issues/488 . Regardless of the use of Kubernetes and possible workarounds, I consider this an issue/bug. If DNS is provided for addressing IOC, it's reasonable to expect that it would be used for reconnection and retry mechanisms. Moreover, modern technologies increasingly rely on DNS addressing rather than IP addresses, and the assumption that a DNS address equals an immutable static IP is becoming less valid. Thank you all for your support, hoping the epics community will consider my epics-beginner point of view.

ralphlange commented 3 months ago

@ralphlange how could I go about hearing those concerns, please?

I would say... trigger a discussion at one of the regular Core telecons or - maybe better - at a face-to-face meeting like next week at the codeathon at NSLS-II.

ralphlange commented 3 months ago

Although I'm new to EPICS, from an external perspective, automatically reloading iptables and check if are still valid based on DNS resolution at least after disconnections or timeouts seems like a valuable feature that could streamline certain workflows. In my workflow the table EPICS_CA_ADDR_LIST is automatically filled as I allocate a new ioc/softioc to run inside/outside k8s.

Just to clarify - sorry if I misunderstood you: The iptables mechanism is basically unrelated. EPICS_CA_ADDR_LIST is an environment variable. Unless using dirty hacks, you can't change the environment of a different running process "from the outside". If you want to change the EPICS_CA_ADDR_LIST of a Channel Access client (like the Gateway), it needs to be restarted.

That's the central idea of using broadcast or multicast for CA name resolution (or generally any type of service discovery) - you don't have to reconfigure and/or restart the client when a new server is added.

amichelotti commented 3 months ago

Although I'm new to EPICS, from an external perspective, automatically reloading iptables and check if are still valid based on DNS resolution at least after disconnections or timeouts seems like a valuable feature that could streamline certain workflows. In my workflow the table EPICS_CA_ADDR_LIST is automatically filled as I allocate a new ioc/softioc to run inside/outside k8s.

Just to clarify - sorry if I misunderstood you: The iptables mechanism is basically unrelated. EPICS_CA_ADDR_LIST is an environment variable. Unless using dirty hacks, you can't change the environment of a different running process "from the outside". If you want to change the EPICS_CA_ADDR_LIST of a Channel Access client (like the Gateway), it needs to be restarted.

That's the central idea of using broadcast or multicast for CA name resolution (or generally any type of service discovery) - you don't have to reconfigure and/or restart the client when a new server is added.

Hi Ralph, Clear!, no dirty hacks!! Just if I specify EPICS_CA_ADDR_LIST=<myiocdnsname> I expect the ca protocol maintains the connection even if <myiocdnsname> changes its IP that is something that may happen. Thank you for your support!

ralphlange commented 3 months ago

Good. The topic has been moved to epics-base/epics-base#488 - closing this one as "invalid".