I am observing a race condition between the NetworkPolicyController and the NetworkServicesController when updating IPVS entries. The scenario is as follow:
There is a service that has an ExternalIP associated with it.
A new pod that the service targets start on a host.
kube-router runs the periodic syncIpvsFirewall and adds the ExternalIP to the kube-router-svip-prt ipset. Here traffic to the ExternalIP coming from other nodes start being ACCEPT-ed by iptables. At this stage, NetworkServicesController also adds the ExternalIP to the ipSetHandlers map it maintains in memory.
something triggers a network policy sync, and kube-router runs syncNetworkPolicyChains. This refreshes ipsets to include IPs contained in NetworkPolicies, starting from the in memory values that the NetworkPolicyController holds in its ipSetHandlers.
The NetworkPolicyControlleripSetHandlers map doesn't know anything about the ExternalIP that was added by the NetworkServicesController, and hence it is removed from kube-router-svip-prt. Traffic to the ExternalIP gets REJECT-ed by itlables, until syncIpvsFirewall runs again.
What did you expect to happen?
The ExternalIPs of services should be added to the kube-router-svip-prt ipset and remain there, instead of getting removed and re-added.
How can we reproduce the behavior you experienced?
Steps to reproduce the behavior:
Have a service with an ExternalIP added to it, say a.b.c.d.
Spin up a new pod targeted by the service
Observe the content of the kube-router-svip-prt ipset on the host where the pod started with ipset list kube-router-svip-prt | grep -P "a\.b\.c\.d"
The IP will be there after kube-router runs syncIpvsFirewall and will disappear when kube-router runs fullPolicySync.
System Information (please complete the following information)
Kube-Router Version (kube-router --version):
Running kube-router version v2.1.0-11-gac6b898c, built on 2024-03-18T20:39:38+0100, go1.22.0
When ipsets are restored by the NetworkServicesController the kube-router-svip-prt contains 87.250.179.246, while when they are restored by the NetworkPolicyController87.250.179.246 is missing.
I am patching the issue for now by running ipset.Save() at each controller before they build their updated version, to make sure the base layer is the current config, instead of the previous inmemory content which might be outdated.
kube-router-ipset-race.log
What happened?
I am observing a race condition between the
NetworkPolicyController
and theNetworkServicesController
when updating IPVS entries. The scenario is as follow:kube-router
runs the periodicsyncIpvsFirewall
and adds the ExternalIP to thekube-router-svip-prt
ipset. Here traffic to the ExternalIP coming from other nodes start being ACCEPT-ed byiptables
. At this stage,NetworkServicesController
also adds the ExternalIP to theipSetHandlers
map it maintains in memory.kube-router
runssyncNetworkPolicyChains
. This refreshes ipsets to include IPs contained in NetworkPolicies, starting from the in memory values that theNetworkPolicyController
holds in itsipSetHandlers
.NetworkPolicyController
ipSetHandlers
map doesn't know anything about the ExternalIP that was added by theNetworkServicesController
, and hence it is removed fromkube-router-svip-prt
. Traffic to the ExternalIP gets REJECT-ed by itlables, untilsyncIpvsFirewall
runs again.What did you expect to happen?
The ExternalIPs of services should be added to the
kube-router-svip-prt
ipset and remain there, instead of getting removed and re-added.How can we reproduce the behavior you experienced?
Steps to reproduce the behavior:
a.b.c.d
.kube-router-svip-prt
ipset on the host where the pod started withipset list kube-router-svip-prt | grep -P "a\.b\.c\.d"
kube-router
runssyncIpvsFirewall
and will disappear whenkube-router
runsfullPolicySync
.System Information (please complete the following information)
Kube-Router Version (
kube-router --version
):Running kube-router version v2.1.0-11-gac6b898c, built on 2024-03-18T20:39:38+0100, go1.22.0
Kube-Router Parameters:
Kubernetes Version (
kubectl version
) : 1.27.13Cloud Type: on premise
Kubernetes Deployment Type: custom scripts
Kube-Router Deployment Type: System service
Cluster Size: The cluster test is 10 nodes, prod clusters are around 100 nodes
Logs, other output, metrics
This is what i see in logs (I extracted the relevant parts, the full run is attached)
When ipsets are restored by the
NetworkServicesController
thekube-router-svip-prt
contains87.250.179.246
, while when they are restored by theNetworkPolicyController
87.250.179.246
is missing.I am patching the issue for now by running
ipset.Save()
at each controller before they build their updated version, to make sure the base layer is the current config, instead of the previous inmemory content which might be outdated. kube-router-ipset-race.log