cloudnativelabs / kube-router

Kube-router, a turnkey solution for Kubernetes networking.
https://kube-router.io
Apache License 2.0
2.3k stars 468 forks source link

Race condition between netpol and IPVS based ipset updates #1732

Open alexcriss opened 1 month ago

alexcriss commented 1 month ago

What happened?

I am observing a race condition between the NetworkPolicyController and the NetworkServicesController when updating IPVS entries. The scenario is as follow:

What did you expect to happen?

The ExternalIPs of services should be added to the kube-router-svip-prt ipset and remain there, instead of getting removed and re-added.

How can we reproduce the behavior you experienced?

Steps to reproduce the behavior:

  1. Have a service with an ExternalIP added to it, say a.b.c.d.
  2. Spin up a new pod targeted by the service
  3. Observe the content of the kube-router-svip-prt ipset on the host where the pod started with ipset list kube-router-svip-prt | grep -P "a\.b\.c\.d"
  4. The IP will be there after kube-router runs syncIpvsFirewall and will disappear when kube-router runs fullPolicySync.

System Information (please complete the following information)

Logs, other output, metrics

This is what i see in logs (I extracted the relevant parts, the full run is attached)

Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: I0905 08:29:55.259444  248635 service_endpoints_sync.go:87] Syncing IPVS Firewall
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: I0905 08:29:55.259452  248635 network_services_controller.go:612] Attempting to attain ipset mutex lock
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: I0905 08:29:55.259460  248635 network_services_controller.go:614] Attained ipset mutex lock, continuing...
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: I0905 08:29:55.265608  248635 ipset.go:564] 
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: create TMP-TF3INM4IEYGA443O hash:ip,port timeout 0
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: flush TMP-TF3INM4IEYGA443O
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-TF3INM4IEYGA443O 172.30.28.239,tcp:443 timeout 0
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-TF3INM4IEYGA443O 87.250.179.244,tcp:443 timeout 0
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-TF3INM4IEYGA443O 87.250.179.246,tcp:443 timeout 0
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-TF3INM4IEYGA443O 172.30.216.84,tcp:25 timeout 0
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-TF3INM4IEYGA443O 87.250.179.242,tcp:443 timeout 0
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-TF3INM4IEYGA443O 172.30.36.204,tcp:443 timeout 0
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-TF3INM4IEYGA443O 172.30.232.251,tcp:8080 timeout 0
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-TF3INM4IEYGA443O 172.30.0.50,tcp:80 timeout 0
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-TF3INM4IEYGA443O 172.30.28.239,tcp:8080 timeout 0
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-TF3INM4IEYGA443O 172.30.204.62,tcp:80 timeout 0
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-TF3INM4IEYGA443O 172.30.248.172,tcp:443 timeout 0
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-TF3INM4IEYGA443O 172.30.79.217,tcp:11233 timeout 0
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-TF3INM4IEYGA443O 172.30.232.251,tcp:9666 timeout 0
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-TF3INM4IEYGA443O 172.30.0.69,tcp:2379 timeout 0
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-TF3INM4IEYGA443O 172.30.216.84,tcp:80 timeout 0
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-TF3INM4IEYGA443O 172.30.140.191,tcp:443 timeout 0
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-TF3INM4IEYGA443O 172.30.0.64,tcp:2379 timeout 0
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-TF3INM4IEYGA443O 172.30.0.65,tcp:2379 timeout 0
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-TF3INM4IEYGA443O 172.30.16.186,tcp:443 timeout 0
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-TF3INM4IEYGA443O 172.30.0.66,tcp:2379 timeout 0
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-TF3INM4IEYGA443O 172.30.0.67,tcp:2379 timeout 0
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-TF3INM4IEYGA443O 172.30.48.115,tcp:2379 timeout 0
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-TF3INM4IEYGA443O 172.30.0.10,tcp:53 timeout 0
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-TF3INM4IEYGA443O 172.30.16.131,tcp:443 timeout 0
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-TF3INM4IEYGA443O 172.30.118.158,tcp:443 timeout 0
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-TF3INM4IEYGA443O 172.30.183.152,tcp:443 timeout 0
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-TF3INM4IEYGA443O 172.30.0.10,udp:53 timeout 0
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-TF3INM4IEYGA443O 172.30.16.131,tcp:80 timeout 0
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-TF3INM4IEYGA443O 172.30.200.95,tcp:8080 timeout 0
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-TF3INM4IEYGA443O 172.30.127.69,tcp:9402 timeout 0
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-TF3INM4IEYGA443O 172.30.118.158,tcp:80 timeout 0
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-TF3INM4IEYGA443O 172.30.183.152,tcp:80 timeout 0
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-TF3INM4IEYGA443O 172.30.189.55,tcp:9443 timeout 0
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-TF3INM4IEYGA443O 172.30.200.95,tcp:443 timeout 0
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-TF3INM4IEYGA443O 172.30.140.191,tcp:80 timeout 0
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-TF3INM4IEYGA443O 172.30.16.186,tcp:80 timeout 0
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-TF3INM4IEYGA443O 172.30.0.10,tcp:9153 timeout 0
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-TF3INM4IEYGA443O 172.30.204.62,tcp:443 timeout 0
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-TF3INM4IEYGA443O 172.30.2.10,tcp:443 timeout 0
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-TF3INM4IEYGA443O 172.30.0.1,tcp:443 timeout 0
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-TF3INM4IEYGA443O 172.30.10.1,tcp:443 timeout 0
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-TF3INM4IEYGA443O 172.30.189.55,tcp:8080 timeout 0
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-TF3INM4IEYGA443O 87.250.179.246,tcp:80 timeout 0
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-TF3INM4IEYGA443O 172.30.108.132,tcp:9222 timeout 0
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-TF3INM4IEYGA443O 87.250.179.244,tcp:80 timeout 0
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-TF3INM4IEYGA443O 87.250.179.242,tcp:80 timeout 0
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-TF3INM4IEYGA443O 172.30.148.17,tcp:443 timeout 0
Sep 05 08:29:55 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: create kube-router-svip-prt hash:ip,port timeout 0
...
Sep 05 08:30:00 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: I0905 08:30:00.159752  248635 network_policy_controller.go:195] Received request for a full sync, processing
Sep 05 08:30:00 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: I0905 08:30:00.159764  248635 network_policy_controller.go:243] Starting sync of iptables with version: 1725525000159758032
Please provide logs, other kind of output or observed metrics here.
Sep 05 08:30:00 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-DEZFJSJBULNQ6H3V 172.30.216.84,tcp:80 timeout 0
Sep 05 08:30:00 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-DEZFJSJBULNQ6H3V 172.30.248.172,tcp:443 timeout 0
Sep 05 08:30:00 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-DEZFJSJBULNQ6H3V 172.30.16.186,tcp:80 timeout 0
Sep 05 08:30:00 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-DEZFJSJBULNQ6H3V 172.30.0.67,tcp:2379 timeout 0
Sep 05 08:30:00 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-DEZFJSJBULNQ6H3V 172.30.0.64,tcp:2379 timeout 0
Sep 05 08:30:00 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-DEZFJSJBULNQ6H3V 172.30.118.158,tcp:80 timeout 0
Sep 05 08:30:00 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-DEZFJSJBULNQ6H3V 172.30.183.152,tcp:443 timeout 0
Sep 05 08:30:00 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-DEZFJSJBULNQ6H3V 172.30.127.69,tcp:9402 timeout 0
Sep 05 08:30:00 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-DEZFJSJBULNQ6H3V 172.30.28.239,tcp:8080 timeout 0
Sep 05 08:30:00 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-DEZFJSJBULNQ6H3V 172.30.0.10,tcp:53 timeout 0
Sep 05 08:30:00 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-DEZFJSJBULNQ6H3V 172.30.0.10,tcp:9153 timeout 0
Sep 05 08:30:00 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-DEZFJSJBULNQ6H3V 172.30.200.95,tcp:8080 timeout 0
Sep 05 08:30:00 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-DEZFJSJBULNQ6H3V 172.30.2.10,tcp:443 timeout 0
Sep 05 08:30:00 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-DEZFJSJBULNQ6H3V 172.30.140.191,tcp:443 timeout 0
Sep 05 08:30:00 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-DEZFJSJBULNQ6H3V 87.250.179.244,tcp:443 timeout 0
Sep 05 08:30:00 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-DEZFJSJBULNQ6H3V 172.30.0.65,tcp:2379 timeout 0
Sep 05 08:30:00 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-DEZFJSJBULNQ6H3V 172.30.232.251,tcp:9666 timeout 0
Sep 05 08:30:00 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-DEZFJSJBULNQ6H3V 172.30.189.55,tcp:9443 timeout 0
Sep 05 08:30:00 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-DEZFJSJBULNQ6H3V 172.30.16.131,tcp:80 timeout 0
Sep 05 08:30:00 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-DEZFJSJBULNQ6H3V 172.30.16.186,tcp:443 timeout 0
Sep 05 08:30:00 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-DEZFJSJBULNQ6H3V 172.30.0.66,tcp:2379 timeout 0
Sep 05 08:30:00 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-DEZFJSJBULNQ6H3V 172.30.232.251,tcp:8080 timeout 0
Sep 05 08:30:00 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-DEZFJSJBULNQ6H3V 172.30.118.158,tcp:443 timeout 0
Sep 05 08:30:00 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-DEZFJSJBULNQ6H3V 172.30.0.1,tcp:443 timeout 0
Sep 05 08:30:00 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-DEZFJSJBULNQ6H3V 172.30.140.191,tcp:80 timeout 0
Sep 05 08:30:00 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-DEZFJSJBULNQ6H3V 172.30.10.1,tcp:443 timeout 0
Sep 05 08:30:00 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-DEZFJSJBULNQ6H3V 172.30.204.62,tcp:443 timeout 0
Sep 05 08:30:00 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-DEZFJSJBULNQ6H3V 172.30.0.50,tcp:80 timeout 0
Sep 05 08:30:00 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-DEZFJSJBULNQ6H3V 172.30.183.152,tcp:80 timeout 0
Sep 05 08:30:00 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-DEZFJSJBULNQ6H3V 172.30.0.10,udp:53 timeout 0
Sep 05 08:30:00 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-DEZFJSJBULNQ6H3V 172.30.48.115,tcp:2379 timeout 0
Sep 05 08:30:00 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-DEZFJSJBULNQ6H3V 172.30.16.131,tcp:443 timeout 0
Sep 05 08:30:00 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-DEZFJSJBULNQ6H3V 172.30.0.69,tcp:2379 timeout 0
Sep 05 08:30:00 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-DEZFJSJBULNQ6H3V 172.30.204.62,tcp:80 timeout 0
Sep 05 08:30:00 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-DEZFJSJBULNQ6H3V 87.250.179.244,tcp:80 timeout 0
Sep 05 08:30:00 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-DEZFJSJBULNQ6H3V 87.250.179.242,tcp:80 timeout 0
Sep 05 08:30:00 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-DEZFJSJBULNQ6H3V 172.30.200.95,tcp:443 timeout 0
Sep 05 08:30:00 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-DEZFJSJBULNQ6H3V 172.30.189.55,tcp:8080 timeout 0
Sep 05 08:30:00 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-DEZFJSJBULNQ6H3V 172.30.148.17,tcp:443 timeout 0
Sep 05 08:30:00 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-DEZFJSJBULNQ6H3V 172.30.79.217,tcp:11233 timeout 0
Sep 05 08:30:00 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-DEZFJSJBULNQ6H3V 172.30.216.84,tcp:25 timeout 0
Sep 05 08:30:00 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-DEZFJSJBULNQ6H3V 172.30.108.132,tcp:9222 timeout 0
Sep 05 08:30:00 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-DEZFJSJBULNQ6H3V 172.30.36.204,tcp:443 timeout 0
Sep 05 08:30:00 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-DEZFJSJBULNQ6H3V 87.250.179.242,tcp:443 timeout 0
Sep 05 08:30:00 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: add TMP-DEZFJSJBULNQ6H3V 172.30.28.239,tcp:443 timeout 0
Sep 05 08:30:00 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: create kube-router-svip-prt hash:ip,port family inet hashsize 1024 maxelem 65536 timeout 0
Sep 05 08:30:00 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: swap TMP-DEZFJSJBULNQ6H3V kube-router-svip-prt
Sep 05 08:30:00 vip-k8s-general-C12-36.dfw.vipv2.net kube-router[248635]: flush TMP-DEZFJSJBULNQ6H3V

When ipsets are restored by the NetworkServicesController the kube-router-svip-prt contains 87.250.179.246, while when they are restored by the NetworkPolicyController 87.250.179.246 is missing.

I am patching the issue for now by running ipset.Save() at each controller before they build their updated version, to make sure the base layer is the current config, instead of the previous inmemory content which might be outdated. kube-router-ipset-race.log

github-actions[bot] commented 6 days ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

walthowd commented 6 days ago

Not stale.