cloudnativelabs / kube-router

Kube-router, a turnkey solution for Kubernetes networking.
https://kube-router.io
Apache License 2.0
2.3k stars 466 forks source link

Creating LoadBalancer service blocks API server IP #1685

Closed k6av closed 2 months ago

k6av commented 3 months ago

What happened? After creating a service with type set to LoadBalancer, the Kubernetes API server becomes unreachable.

What did you expect to happen? API server accessibility is unaffected.

How can we reproduce the behavior you experienced? Steps to reproduce the behavior:

  1. Create a load balancer service (kubectl create service loadbalancer example --tcp=1337:1337)
  2. Try to:
    • Access any API server endpoint (kubectl get nodes), or
    • Ping the API server IP address.

System Information:

Logs, other output, metrics

ipset list output before creating the service (API server is reachable)

``` Name: inet6:kube-router-pod-subnets Type: hash:net Revision: 7 Header: family inet6 hashsize 1024 maxelem 65536 timeout 0 bucketsize 12 initval 0xe7ac8722 Size in memory: 1336 References: 2 Number of entries: 1 Members: fdcd:5f23:4ab4:10::/64 timeout 0 Name: inet6:kube-router-node-ips Type: hash:ip Revision: 6 Header: family inet6 hashsize 1024 maxelem 65536 timeout 0 bucketsize 12 initval 0xe1f932ec Size in memory: 304 References: 1 Number of entries: 1 Members: fdcd:0cd1:5dd1:28::1 timeout 0 Name: inet6:kube-router-local-ips Type: hash:ip Revision: 6 Header: family inet6 hashsize 1024 maxelem 65536 timeout 0 bucketsize 12 initval 0x5689a634 Size in memory: 224 References: 1 Number of entries: 0 Members: Name: inet6:kube-router-svip Type: hash:ip Revision: 6 Header: family inet6 hashsize 1024 maxelem 65536 timeout 0 bucketsize 12 initval 0x978ab45a Size in memory: 384 References: 1 Number of entries: 2 Members: fdcd:5f23:4ab4:11::1 timeout 0 fdcd:5f23:4ab4:11::10 timeout 0 Name: inet6:kube-router-svip-prt Type: hash:ip,port Revision: 7 Header: family inet6 hashsize 1024 maxelem 65536 timeout 0 bucketsize 12 initval 0xf49105d3 Size in memory: 616 References: 1 Number of entries: 4 Members: fdcd:5f23:4ab4:11::10,tcp:53 timeout 0 fdcd:5f23:4ab4:11::10,tcp:9153 timeout 0 fdcd:5f23:4ab4:11::10,udp:53 timeout 0 fdcd:5f23:4ab4:11::1,tcp:443 timeout 0 ```

ipset list output after creating the service (API server is unreachable)

``` Name: inet6:kube-router-pod-subnets Type: hash:net Revision: 7 Header: family inet6 hashsize 1024 maxelem 65536 timeout 0 bucketsize 12 initval 0xe7ac8722 Size in memory: 1336 References: 2 Number of entries: 1 Members: fdcd:5f23:4ab4:10::/64 timeout 0 Name: inet6:kube-router-node-ips Type: hash:ip Revision: 6 Header: family inet6 hashsize 1024 maxelem 65536 timeout 0 bucketsize 12 initval 0xe1f932ec Size in memory: 304 References: 1 Number of entries: 1 Members: fdcd:0cd1:5dd1:28::1 timeout 0 Name: inet6:kube-router-local-ips Type: hash:ip Revision: 6 Header: family inet6 hashsize 1024 maxelem 65536 timeout 0 bucketsize 12 initval 0x9e1e76c3 Size in memory: 224 References: 1 Number of entries: 0 Members: Name: inet6:kube-router-svip Type: hash:ip Revision: 6 Header: family inet6 hashsize 1024 maxelem 65536 timeout 0 bucketsize 12 initval 0x5689a634 Size in memory: 544 References: 1 Number of entries: 4 Members: fdcd:5f23:4ab4:11::5402 timeout 0 fdcd:0cd1:5dd1:28::1 timeout 0 fdcd:5f23:4ab4:11::10 timeout 0 fdcd:5f23:4ab4:11::1 timeout 0 Name: inet6:kube-router-svip-prt Type: hash:ip,port Revision: 7 Header: family inet6 hashsize 1024 maxelem 65536 timeout 0 bucketsize 12 initval 0xf31f40be Size in memory: 808 References: 1 Number of entries: 6 Members: fdcd:5f23:4ab4:11::10,udp:53 timeout 0 fdcd:5f23:4ab4:11::5402,tcp:1337 timeout 0 fdcd:5f23:4ab4:11::1,tcp:443 timeout 0 fdcd:5f23:4ab4:11::10,tcp:53 timeout 0 fdcd:5f23:4ab4:11::10,tcp:9153 timeout 0 fdcd:0cd1:5dd1:28::1,tcp:30868 timeout 0 ```

ip6tables -L output before creating the service (API server is reachable)

``` \Chain INPUT (policy ACCEPT) target prot opt source destination KUBE-ROUTER-INPUT all -- anywhere anywhere /* kube-router netpol - 4IA2OSFRMVNDXBVV */ KUBE-ROUTER-SERVICES all -- anywhere anywhere /* handle traffic to IPVS service IPs in custom chain */ match-set inet6:kube-router-svip dst Chain FORWARD (policy ACCEPT) target prot opt source destination KUBE-ROUTER-FORWARD all -- anywhere anywhere /* kube-router netpol - TEMCG2JMHZYE7H7T */ ACCEPT all -- anywhere anywhere /* allow outbound node port traffic on node interface with which node ip is associated */ ACCEPT all -- anywhere anywhere /* allow inbound traffic to pods */ ACCEPT all -- anywhere anywhere /* allow outbound traffic from pods */ Chain OUTPUT (policy ACCEPT) target prot opt source destination KUBE-ROUTER-OUTPUT all -- anywhere anywhere /* kube-router netpol - VEAAIY32XVBHCSCY */ Chain KUBE-KUBELET-CANARY (0 references) target prot opt source destination Chain KUBE-NWPLCY-DEFAULT (2 references) target prot opt source destination MARK all -- anywhere anywhere /* rule to mark traffic matching a network policy */ MARK or 0x10000 Chain KUBE-POD-FW-MCT5QETSQR7LAPG7 (7 references) target prot opt source destination ACCEPT all -- anywhere anywhere /* rule for stateful firewall for pod */ ctstate RELATED,ESTABLISHED DROP all -- anywhere anywhere /* rule to drop invalid state for pod */ ctstate INVALID ACCEPT all -- anywhere fdcd:5f23:4ab4:10::2 /* rule to permit the traffic traffic to pods when source is the pod's local node */ ADDRTYPE match src-type LOCAL KUBE-NWPLCY-DEFAULT all -- fdcd:5f23:4ab4:10::2 anywhere /* run through default egress network policy chain */ KUBE-NWPLCY-DEFAULT all -- anywhere fdcd:5f23:4ab4:10::2 /* run through default ingress network policy chain */ NFLOG all -- anywhere anywhere /* rule to log dropped traffic POD name:coredns-5cfff596b9-sz64c namespace: kube-system */ mark match ! 0x10000/0x10000 limit: avg 10/min burst 10 nflog-group 100 REJECT all -- anywhere anywhere /* rule to REJECT traffic destined for POD name:coredns-5cfff596b9-sz64c namespace: kube-system */ mark match ! 0x10000/0x10000 reject-with icmp6-port-unreachable MARK all -- anywhere anywhere MARK and 0xfffeffff MARK all -- anywhere anywhere /* set mark to ACCEPT traffic that comply to network policies */ MARK or 0x20000 Chain KUBE-ROUTER-FORWARD (1 references) target prot opt source destination KUBE-POD-FW-MCT5QETSQR7LAPG7 all -- anywhere fdcd:5f23:4ab4:10::2 /* rule to jump traffic destined to POD name:coredns-5cfff596b9-sz64c namespace: kube-system to chain KUBE-POD-FW-MCT5QETSQR7LAPG7 */ KUBE-POD-FW-MCT5QETSQR7LAPG7 all -- anywhere fdcd:5f23:4ab4:10::2 PHYSDEV match --physdev-is-bridged /* rule to jump traffic destined to POD name:coredns-5cfff596b9-sz64c namespace: kube-system to chain KUBE-POD-FW-MCT5QETSQR7LAPG7 */ KUBE-POD-FW-MCT5QETSQR7LAPG7 all -- fdcd:5f23:4ab4:10::2 anywhere /* rule to jump traffic from POD name:coredns-5cfff596b9-sz64c namespace: kube-system to chain KUBE-POD-FW-MCT5QETSQR7LAPG7 */ KUBE-POD-FW-MCT5QETSQR7LAPG7 all -- fdcd:5f23:4ab4:10::2 anywhere PHYSDEV match --physdev-is-bridged /* rule to jump traffic from POD name:coredns-5cfff596b9-sz64c namespace: kube-system to chain KUBE-POD-FW-MCT5QETSQR7LAPG7 */ ACCEPT all -- anywhere anywhere /* rule to explicitly ACCEPT traffic that comply to network policies */ mark match 0x20000/0x20000 Chain KUBE-ROUTER-INPUT (1 references) target prot opt source destination RETURN all -- anywhere fdcd:5f23:4ab4:11::/112 /* allow traffic to primary/secondary cluster IP range - S45O6WQWYVU3GELQ */ RETURN tcp -- anywhere anywhere /* allow LOCAL TCP traffic to node ports - LR7XO7NXDBGQJD2M */ ADDRTYPE match dst-type LOCAL multiport dports ndmps:filenet-powsrm RETURN udp -- anywhere anywhere /* allow LOCAL UDP traffic to node ports - 76UCBPIZNGJNWNUZ */ ADDRTYPE match dst-type LOCAL multiport dports 30000:filenet-powsrm RETURN all -- anywhere fdcd:5f23:4ab4:12::/112 /* allow traffic to load balancer IP range: fdcd:5f23:4ab4:12::/112 - PP4T6FE3ON4YJVID */ RETURN tcp -- anywhere anywhere /* allow LOCAL TCP traffic to node ports - LR7XO7NXDBGQJD2M */ ADDRTYPE match dst-type LOCAL multiport dports ndmps:filenet-powsrm RETURN all -- anywhere fdcd:5f23:4ab4:12::/112 /* allow traffic to load balancer IP range: fdcd:5f23:4ab4:12::/112 - PP4T6FE3ON4YJVID */ RETURN tcp -- anywhere anywhere /* allow LOCAL TCP traffic to node ports - LR7XO7NXDBGQJD2M */ ADDRTYPE match dst-type LOCAL multiport dports ndmps:filenet-powsrm RETURN all -- anywhere fdcd:5f23:4ab4:12::/112 /* allow traffic to load balancer IP range: fdcd:5f23:4ab4:12::/112 - PP4T6FE3ON4YJVID */ KUBE-POD-FW-MCT5QETSQR7LAPG7 all -- fdcd:5f23:4ab4:10::2 anywhere /* rule to jump traffic from POD name:coredns-5cfff596b9-sz64c namespace: kube-system to chain KUBE-POD-FW-MCT5QETSQR7LAPG7 */ ACCEPT all -- anywhere anywhere /* rule to explicitly ACCEPT traffic that comply to network policies */ mark match 0x20000/0x20000 Chain KUBE-ROUTER-OUTPUT (1 references) target prot opt source destination KUBE-POD-FW-MCT5QETSQR7LAPG7 all -- anywhere fdcd:5f23:4ab4:10::2 /* rule to jump traffic destined to POD name:coredns-5cfff596b9-sz64c namespace: kube-system to chain KUBE-POD-FW-MCT5QETSQR7LAPG7 */ KUBE-POD-FW-MCT5QETSQR7LAPG7 all -- fdcd:5f23:4ab4:10::2 anywhere /* rule to jump traffic from POD name:coredns-5cfff596b9-sz64c namespace: kube-system to chain KUBE-POD-FW-MCT5QETSQR7LAPG7 */ ACCEPT all -- anywhere anywhere /* rule to explicitly ACCEPT traffic that comply to network policies */ mark match 0x20000/0x20000 Chain KUBE-ROUTER-SERVICES (1 references) target prot opt source destination ACCEPT ipv6-icmp -- anywhere anywhere /* allow icmp echo requests to service IPs */ ipv6-icmp echo-request ACCEPT ipv6-icmp -- anywhere anywhere /* allow icmp ttl exceeded messages to service IPs */ ipv6-icmp time-exceeded ACCEPT ipv6-icmp -- anywhere anywhere /* allow icmp destination unreachable messages to service IPs */ ipv6-icmp destination-unreachable ACCEPT all -- anywhere anywhere /* allow input traffic to ipvs services */ match-set inet6:kube-router-svip-prt dst,dst REJECT all -- anywhere anywhere /* reject all unexpected traffic to service IPs */ ! match-set inet6:kube-router-local-ips dst reject-with icmp6-port-unreachable ```

ip6tables -L output after creating the service (API server is unreachable)

``` Chain INPUT (policy ACCEPT) target prot opt source destination KUBE-ROUTER-INPUT all -- anywhere anywhere /* kube-router netpol - 4IA2OSFRMVNDXBVV */ KUBE-ROUTER-SERVICES all -- anywhere anywhere /* handle traffic to IPVS service IPs in custom chain */ match-set inet6:kube-router-svip dst Chain FORWARD (policy ACCEPT) target prot opt source destination KUBE-ROUTER-FORWARD all -- anywhere anywhere /* kube-router netpol - TEMCG2JMHZYE7H7T */ ACCEPT all -- anywhere anywhere /* allow outbound node port traffic on node interface with which node ip is associated */ ACCEPT all -- anywhere anywhere /* allow inbound traffic to pods */ ACCEPT all -- anywhere anywhere /* allow outbound traffic from pods */ Chain OUTPUT (policy ACCEPT) target prot opt source destination KUBE-ROUTER-OUTPUT all -- anywhere anywhere /* kube-router netpol - VEAAIY32XVBHCSCY */ Chain KUBE-KUBELET-CANARY (0 references) target prot opt source destination Chain KUBE-NWPLCY-DEFAULT (2 references) target prot opt source destination MARK all -- anywhere anywhere /* rule to mark traffic matching a network policy */ MARK or 0x10000 Chain KUBE-POD-FW-RFIAS6HOABJQ3KE2 (7 references) target prot opt source destination ACCEPT all -- anywhere anywhere /* rule for stateful firewall for pod */ ctstate RELATED,ESTABLISHED DROP all -- anywhere anywhere /* rule to drop invalid state for pod */ ctstate INVALID ACCEPT all -- anywhere fdcd:5f23:4ab4:10::2 /* rule to permit the traffic traffic to pods when source is the pod's local node */ ADDRTYPE match src-type LOCAL KUBE-NWPLCY-DEFAULT all -- fdcd:5f23:4ab4:10::2 anywhere /* run through default egress network policy chain */ KUBE-NWPLCY-DEFAULT all -- anywhere fdcd:5f23:4ab4:10::2 /* run through default ingress network policy chain */ NFLOG all -- anywhere anywhere /* rule to log dropped traffic POD name:coredns-5cfff596b9-sz64c namespace: kube-system */ mark match ! 0x10000/0x10000 limit: avg 10/min burst 10 nflog-group 100 REJECT all -- anywhere anywhere /* rule to REJECT traffic destined for POD name:coredns-5cfff596b9-sz64c namespace: kube-system */ mark match ! 0x10000/0x10000 reject-with icmp6-port-unreachable MARK all -- anywhere anywhere MARK and 0xfffeffff MARK all -- anywhere anywhere /* set mark to ACCEPT traffic that comply to network policies */ MARK or 0x20000 Chain KUBE-ROUTER-FORWARD (1 references) target prot opt source destination KUBE-POD-FW-RFIAS6HOABJQ3KE2 all -- anywhere fdcd:5f23:4ab4:10::2 /* rule to jump traffic destined to POD name:coredns-5cfff596b9-sz64c namespace: kube-system to chain KUBE-POD-FW-RFIAS6HOABJQ3KE2 */ KUBE-POD-FW-RFIAS6HOABJQ3KE2 all -- anywhere fdcd:5f23:4ab4:10::2 PHYSDEV match --physdev-is-bridged /* rule to jump traffic destined to POD name:coredns-5cfff596b9-sz64c namespace: kube-system to chain KUBE-POD-FW-RFIAS6HOABJQ3KE2 */ KUBE-POD-FW-RFIAS6HOABJQ3KE2 all -- fdcd:5f23:4ab4:10::2 anywhere /* rule to jump traffic from POD name:coredns-5cfff596b9-sz64c namespace: kube-system to chain KUBE-POD-FW-RFIAS6HOABJQ3KE2 */ KUBE-POD-FW-RFIAS6HOABJQ3KE2 all -- fdcd:5f23:4ab4:10::2 anywhere PHYSDEV match --physdev-is-bridged /* rule to jump traffic from POD name:coredns-5cfff596b9-sz64c namespace: kube-system to chain KUBE-POD-FW-RFIAS6HOABJQ3KE2 */ ACCEPT all -- anywhere anywhere /* rule to explicitly ACCEPT traffic that comply to network policies */ mark match 0x20000/0x20000 Chain KUBE-ROUTER-INPUT (1 references) target prot opt source destination RETURN all -- anywhere fdcd:5f23:4ab4:11::/112 /* allow traffic to primary/secondary cluster IP range - S45O6WQWYVU3GELQ */ RETURN tcp -- anywhere anywhere /* allow LOCAL TCP traffic to node ports - LR7XO7NXDBGQJD2M */ ADDRTYPE match dst-type LOCAL multiport dports ndmps:filenet-powsrm RETURN udp -- anywhere anywhere /* allow LOCAL UDP traffic to node ports - 76UCBPIZNGJNWNUZ */ ADDRTYPE match dst-type LOCAL multiport dports 30000:filenet-powsrm RETURN all -- anywhere fdcd:5f23:4ab4:12::/112 /* allow traffic to load balancer IP range: fdcd:5f23:4ab4:12::/112 - PP4T6FE3ON4YJVID */ RETURN tcp -- anywhere anywhere /* allow LOCAL TCP traffic to node ports - LR7XO7NXDBGQJD2M */ ADDRTYPE match dst-type LOCAL multiport dports ndmps:filenet-powsrm RETURN all -- anywhere fdcd:5f23:4ab4:12::/112 /* allow traffic to load balancer IP range: fdcd:5f23:4ab4:12::/112 - PP4T6FE3ON4YJVID */ RETURN tcp -- anywhere anywhere /* allow LOCAL TCP traffic to node ports - LR7XO7NXDBGQJD2M */ ADDRTYPE match dst-type LOCAL multiport dports ndmps:filenet-powsrm RETURN all -- anywhere fdcd:5f23:4ab4:12::/112 /* allow traffic to load balancer IP range: fdcd:5f23:4ab4:12::/112 - PP4T6FE3ON4YJVID */ KUBE-POD-FW-RFIAS6HOABJQ3KE2 all -- fdcd:5f23:4ab4:10::2 anywhere /* rule to jump traffic from POD name:coredns-5cfff596b9-sz64c namespace: kube-system to chain KUBE-POD-FW-RFIAS6HOABJQ3KE2 */ ACCEPT all -- anywhere anywhere /* rule to explicitly ACCEPT traffic that comply to network policies */ mark match 0x20000/0x20000 Chain KUBE-ROUTER-OUTPUT (1 references) target prot opt source destination KUBE-POD-FW-RFIAS6HOABJQ3KE2 all -- anywhere fdcd:5f23:4ab4:10::2 /* rule to jump traffic destined to POD name:coredns-5cfff596b9-sz64c namespace: kube-system to chain KUBE-POD-FW-RFIAS6HOABJQ3KE2 */ KUBE-POD-FW-RFIAS6HOABJQ3KE2 all -- fdcd:5f23:4ab4:10::2 anywhere /* rule to jump traffic from POD name:coredns-5cfff596b9-sz64c namespace: kube-system to chain KUBE-POD-FW-RFIAS6HOABJQ3KE2 */ ACCEPT all -- anywhere anywhere /* rule to explicitly ACCEPT traffic that comply to network policies */ mark match 0x20000/0x20000 Chain KUBE-ROUTER-SERVICES (1 references) target prot opt source destination ACCEPT ipv6-icmp -- anywhere anywhere /* allow icmp echo requests to service IPs */ ipv6-icmp echo-request ACCEPT ipv6-icmp -- anywhere anywhere /* allow icmp ttl exceeded messages to service IPs */ ipv6-icmp time-exceeded ACCEPT ipv6-icmp -- anywhere anywhere /* allow icmp destination unreachable messages to service IPs */ ipv6-icmp destination-unreachable ACCEPT all -- anywhere anywhere /* allow input traffic to ipvs services */ match-set inet6:kube-router-svip-prt dst,dst REJECT all -- anywhere anywhere /* reject all unexpected traffic to service IPs */ ! match-set inet6:kube-router-local-ips dst reject-with icmp6-port-unreachable ```

fdcd:0cd1:5dd1:28::1 is the address of the API server.

aauren commented 3 months ago

Out of curiosity, what is the address of the kube-apiserver in your kubectl config file? If this is a DNS name, please resolve it first before sending it.

k6av commented 3 months ago

Hi, thanks for the reply. The API server is addressed with https://[fdcd:0cd1:5dd1:28::]:6443 everywhere including all kubeconfigs, no DNS involved.

k6av commented 3 months ago

It seems the API server is blocked because the node IP is added to the inet6:kube-router-svip set. Given the name of the set I'd suspect this is not supposed to happen since the node IP is not a service VIP.

aauren commented 3 months ago

Good sleuthing I got distracted on another problem that I found with ipv6 while I was looking into this one.

I'll try to track down the codepath for that after I see if I can reproduce this.

Out of curiosity, do you see that IP (the one of the node) listed in kubectl describe services -A anywhere?

k6av commented 3 months ago

Out of curiosity, do you see that IP (the one of the node) listed in kubectl describe services -A anywhere?

The node's IP is listed in the endpoint of the kubernetes service since the node hosts the static control plane pods which have host networking, but nowhere else.

I'll try to track down the codepath for that after I see if I can reproduce this.

I'll see if I can collect some logs from the kube-router pod, if that helps.

k6av commented 3 months ago

I'm having a hard time getting logs due to the fact that the API server is unreachable, but I've dug up some older logs and noticed this excerpt where kube-router adds the node IP to the inet6:kube-router-svip set.

kube-router log excerpt 1

``` I0627 09:33:40.602742 3448 linux_networking.go:298] [tcp]:[10]:[fdcd:5f23:4ab4:12::]:[128]:443 (Flags: ) didn't match any existing IPVS services, creating a new IPVS service E0627 09:33:40.602809 3448 network_routes_controller.go:300] Failed to enable iptables for bridge. Network policies and service proxy may not work: Sysctl net/bridge/bridge-nf-call-iptables=1 : stat /proc/sys/net/bridge/bridge-nf-call-iptables: no such file or directory (option not found, Does your kernel version support this feature?) I0627 09:33:40.602848 3448 linux_networking.go:310] Successfully added service: [tcp]:[10]:[fdcd:5f23:4ab4:12::]:[128]:443 (Flags: ) I0627 09:33:40.602869 3448 service_endpoints_sync.go:437] no FW mark found for service, nothing to cleanup E0627 09:33:40.602872 3448 network_routes_controller.go:306] Failed to enable ip6tables for bridge. Network policies and service proxy may not work: Sysctl net/bridge/bridge-nf-call-ip6tables=1 : stat /proc/sys/net/bridge/bridge-nf-call-ip6tables: no such file or directory (option not found, Does your kernel version support this feature?) I0627 09:33:40.602881 3448 service_endpoints_sync.go:206] No endpoints detected for service VIP: fdcd:5f23:4ab4:12::, skipping adding endpoints... I0627 09:33:40.602900 3448 service_endpoints_sync.go:362] no external IP addresses returned for service default:kubernetes skipping... I0627 09:33:40.602915 3448 service_endpoints_sync.go:64] Setting up NodePort Health Checks for LB services I0627 09:33:40.602920 3448 network_routes_controller.go:315] Starting network route controller I0627 09:33:40.602925 3448 nodeport_healthcheck.go:38] Running UpdateServicesInfo for NodePort health check I0627 09:33:40.602940 3448 nodeport_healthcheck.go:70] Finished UpdateServicesInfo for NodePort health check I0627 09:33:40.602953 3448 service_endpoints_sync.go:71] Cleaning Up Stale VIPs from dummy interface I0627 09:33:40.602963 3448 service_endpoints_sync.go:628] Cleaning up if any, old service IPs on dummy interface I0627 09:33:40.603067 3448 round_trippers.go:466] curl -v -XGET -H "Accept: application/json, */*" -H "User-Agent: kube-router/v0.0.0 (linux/amd64) kubernetes/$Format" -H "Authorization: Bearer " 'https://[fdcd:0cd1:5dd1:28::1]:6443/api/v1/nodes/node01' I0627 09:33:40.603445 3448 service_endpoints_sync.go:649] Found an IP fe80::c8f7:30ff:fe27:f3b6 which is no longer needed so cleaning up I0627 09:33:40.603469 3448 linux_networking.go:86] Ignoring link-local IP address: fe80::c8f7:30ff:fe27:f3b6 I0627 09:33:40.603487 3448 service_endpoints_sync.go:78] Cleaning Up Stale VIPs from IPVS I0627 09:33:40.603768 3448 service_endpoints_sync.go:692] Cleaning up if any, old ipvs service and servers which are no longer needed I0627 09:33:40.603837 3448 service_endpoints_sync.go:695] Current active service map: { "fdcd:0cd1:5dd1:28::1-tcp-30511": [], "fdcd:0cd1:5dd1:28::1-tcp-30967": [], "fdcd:5f23:4ab4:11::1-tcp-443": [ "fdcd:0cd1:5dd1:28::1:6443" ], "fdcd:5f23:4ab4:11::10-tcp-53": [], "fdcd:5f23:4ab4:11::10-tcp-9153": [], "fdcd:5f23:4ab4:11::10-udp-53": [], "fdcd:5f23:4ab4:11::ddae-tcp-443": [], "fdcd:5f23:4ab4:11::ddae-tcp-80": [], "fdcd:5f23:4ab4:12::-tcp-443": [], "fdcd:5f23:4ab4:12::-tcp-80": [] } I0627 09:33:40.604326 3448 service_endpoints_sync.go:85] Cleaning Up Stale metrics I0627 09:33:40.604351 3448 service_endpoints_sync.go:88] Syncing IPVS Firewall I0627 09:33:40.604362 3448 network_services_controller.go:628] Attempting to attain ipset mutex lock I0627 09:33:40.604373 3448 network_services_controller.go:630] Attained ipset mutex lock, continuing... I0627 09:33:40.605616 3448 ipset.go:595] ipset (ipv6? true) restore looks like: create TMP-3TF2IQ5MKRP6NAJL hash:ip timeout 0 family inet6 flush TMP-3TF2IQ5MKRP6NAJL create inet6:kube-router-local-ips hash:ip timeout 0 family inet6 swap TMP-3TF2IQ5MKRP6NAJL inet6:kube-router-local-ips flush TMP-3TF2IQ5MKRP6NAJL add TMP-3TF2IQ5MKRP6NAJL fdcd:5f23:4ab4:12:: timeout 0 add TMP-3TF2IQ5MKRP6NAJL fdcd:5f23:4ab4:11::ddae timeout 0 add TMP-3TF2IQ5MKRP6NAJL fdcd:5f23:4ab4:11::10 timeout 0 add TMP-3TF2IQ5MKRP6NAJL fdcd:5f23:4ab4:11::10 timeout 0 add TMP-3TF2IQ5MKRP6NAJL fdcd:0cd1:5dd1:28::1 timeout 0 add TMP-3TF2IQ5MKRP6NAJL fdcd:0cd1:5dd1:28::1 timeout 0 add TMP-3TF2IQ5MKRP6NAJL fdcd:5f23:4ab4:11::ddae timeout 0 add TMP-3TF2IQ5MKRP6NAJL fdcd:5f23:4ab4:11::10 timeout 0 add TMP-3TF2IQ5MKRP6NAJL fdcd:5f23:4ab4:11::1 timeout 0 add TMP-3TF2IQ5MKRP6NAJL fdcd:5f23:4ab4:12:: timeout 0 create inet6:kube-router-svip hash:ip timeout 0 family inet6 swap TMP-3TF2IQ5MKRP6NAJL inet6:kube-router-svip flush TMP-3TF2IQ5MKRP6NAJL create TMP-54ALMNP65S4KA7CE hash:ip,port timeout 0 family inet6 flush TMP-54ALMNP65S4KA7CE add TMP-54ALMNP65S4KA7CE fdcd:5f23:4ab4:12::,tcp:80 timeout 0 add TMP-54ALMNP65S4KA7CE fdcd:5f23:4ab4:11::ddae,tcp:443 timeout 0 add TMP-54ALMNP65S4KA7CE fdcd:5f23:4ab4:11::10,tcp:53 timeout 0 add TMP-54ALMNP65S4KA7CE fdcd:5f23:4ab4:11::10,udp:53 timeout 0 add TMP-54ALMNP65S4KA7CE fdcd:0cd1:5dd1:28::1,tcp:30511 timeout 0 add TMP-54ALMNP65S4KA7CE fdcd:0cd1:5dd1:28::1,tcp:30967 timeout 0 add TMP-54ALMNP65S4KA7CE fdcd:5f23:4ab4:11::ddae,tcp:80 timeout 0 add TMP-54ALMNP65S4KA7CE fdcd:5f23:4ab4:11::10,tcp:9153 timeout 0 add TMP-54ALMNP65S4KA7CE fdcd:5f23:4ab4:11::1,tcp:443 timeout 0 add TMP-54ALMNP65S4KA7CE fdcd:5f23:4ab4:12::,tcp:443 timeout 0 create inet6:kube-router-svip-prt hash:ip,port timeout 0 family inet6 swap TMP-54ALMNP65S4KA7CE inet6:kube-router-svip-prt flush TMP-54ALMNP65S4KA7CE destroy TMP-3TF2IQ5MKRP6NAJL destroy TMP-54ALMNP65S4KA7CE I0627 09:33:40.605645 3448 ipset.go:225] running ipset command: path=`/usr/sbin/ipset` args=[restore -exist] stdin `​`​`create TMP-3TF2IQ5MKRP6NAJL hash:ip timeout 0 family inet6 flush TMP-3TF2IQ5MKRP6NAJL create inet6:kube-router-local-ips hash:ip timeout 0 family inet6 swap TMP-3TF2IQ5MKRP6NAJL inet6:kube-router-local-ips flush TMP-3TF2IQ5MKRP6NAJL add TMP-3TF2IQ5MKRP6NAJL fdcd:5f23:4ab4:12:: timeout 0 add TMP-3TF2IQ5MKRP6NAJL fdcd:5f23:4ab4:11::ddae timeout 0 add TMP-3TF2IQ5MKRP6NAJL fdcd:5f23:4ab4:11::10 timeout 0 add TMP-3TF2IQ5MKRP6NAJL fdcd:5f23:4ab4:11::10 timeout 0 add TMP-3TF2IQ5MKRP6NAJL fdcd:0cd1:5dd1:28::1 timeout 0 add TMP-3TF2IQ5MKRP6NAJL fdcd:0cd1:5dd1:28::1 timeout 0 add TMP-3TF2IQ5MKRP6NAJL fdcd:5f23:4ab4:11::ddae timeout 0 add TMP-3TF2IQ5MKRP6NAJL fdcd:5f23:4ab4:11::10 timeout 0 add TMP-3TF2IQ5MKRP6NAJL fdcd:5f23:4ab4:11::1 timeout 0 add TMP-3TF2IQ5MKRP6NAJL fdcd:5f23:4ab4:12:: timeout 0 create inet6:kube-router-svip hash:ip timeout 0 family inet6 swap TMP-3TF2IQ5MKRP6NAJL inet6:kube-router-svip flush TMP-3TF2IQ5MKRP6NAJL create TMP-54ALMNP65S4KA7CE hash:ip,port timeout 0 family inet6 flush TMP-54ALMNP65S4KA7CE add TMP-54ALMNP65S4KA7CE fdcd:5f23:4ab4:12::,tcp:80 timeout 0 add TMP-54ALMNP65S4KA7CE fdcd:5f23:4ab4:11::ddae,tcp:443 timeout 0 add TMP-54ALMNP65S4KA7CE fdcd:5f23:4ab4:11::10,tcp:53 timeout 0 add TMP-54ALMNP65S4KA7CE fdcd:5f23:4ab4:11::10,udp:53 timeout 0 add TMP-54ALMNP65S4KA7CE fdcd:0cd1:5dd1:28::1,tcp:30511 timeout 0 add TMP-54ALMNP65S4KA7CE fdcd:0cd1:5dd1:28::1,tcp:30967 timeout 0 add TMP-54ALMNP65S4KA7CE fdcd:5f23:4ab4:11::ddae,tcp:80 timeout 0 add TMP-54ALMNP65S4KA7CE fdcd:5f23:4ab4:11::10,tcp:9153 timeout 0 add TMP-54ALMNP65S4KA7CE fdcd:5f23:4ab4:11::1,tcp:443 timeout 0 add TMP-54ALMNP65S4KA7CE fdcd:5f23:4ab4:12::,tcp:443 timeout 0 create inet6:kube-router-svip-prt hash:ip,port timeout 0 family inet6 swap TMP-54ALMNP65S4KA7CE inet6:kube-router-svip-prt flush TMP-54ALMNP65S4KA7CE destroy TMP-3TF2IQ5MKRP6NAJL destroy TMP-54ALMNP65S4KA7CE `​`​` I0627 09:33:40.608281 3448 network_services_controller.go:633] Returned ipset mutex lock I0627 09:33:40.608312 3448 service_endpoints_sync.go:95] Setting up DSR Services I0627 09:33:40.608325 3448 service_endpoints_sync.go:605] Setting up policy routing required for Direct Server Return functionality. I0627 09:33:40.608400 3448 linux_routing.go:30] Did not find iproute2's rt_tables in location /etc/iproute2/rt_tables I0627 09:33:40.608531 3448 linux_routing.go:30] Did not find iproute2's rt_tables in location /usr/share/iproute2/rt_tables I0627 09:33:40.611737 3448 service_endpoints_sync.go:610] Custom routing table kube-router-dsr required for Direct Server Return is setup as expected. I0627 09:33:40.611757 3448 service_endpoints_sync.go:613] Setting up custom route table required to add routes for external IP's. I0627 09:33:40.611789 3448 linux_routing.go:30] Did not find iproute2's rt_tables in location /etc/iproute2/rt_tables I0627 09:33:40.611845 3448 linux_routing.go:30] Did not find iproute2's rt_tables in location /usr/share/iproute2/rt_tables I0627 09:33:40.617435 3448 service_endpoints_sync.go:621] Custom routing table required for Direct Server Return (external_ip) is setup as expected. I0627 09:33:40.617489 3448 service_endpoints_sync.go:107] IPVS servers and services are synced to desired state I0627 09:33:40.617505 3448 service_endpoints_sync.go:32] sync ipvs services took 48.173796ms ```

aauren commented 2 months ago

Still trying to figure this one out...

If your kube controller node IP is getting added to the inet6:kube-router-svip ipset, that would definitely explain what's going on. Basically, at the end of processing the KUBE-ROUTER-SERVICES chain which is selected by the destination IP being in the inet6:kube-router-svip list, your traffic will be rejected.

Confirming this is happening should be as simple as looking at ip6tables -nvL KUBE-ROUTER-SERVICES and watching the pkts column increase on the reject rule which is the last rule in the chain.

When I tested this locally, it is what I saw:

Before adding the node's ipv6 address to inet6:kube-router-svip manually:

#ip6tables -nvL KUBE-ROUTER-SERVICES
Chain KUBE-ROUTER-SERVICES (1 references)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 ACCEPT     58   --  *      *       ::/0                 ::/0                 /* allow icmp echo requests to service IPs */ ipv6-icmptype 128
    0     0 ACCEPT     58   --  *      *       ::/0                 ::/0                 /* allow icmp ttl exceeded messages to service IPs */ ipv6-icmptype 3
    0     0 ACCEPT     58   --  *      *       ::/0                 ::/0                 /* allow icmp destination unreachable messages to service IPs */ ipv6-icmptype 1
    0     0 ACCEPT     0    --  *      *       ::/0                 ::/0                 /* allow input traffic to ipvs services */ match-set inet6:kube-router-svip-prt dst,dst
    0     0 REJECT     0    --  *      *       ::/0                 ::/0                 /* reject all unexpected traffic to service IPs */ ! match-set inet6:kube-router-local-ips dst reject-with icmp6-port-unreachable

After adding the controller node's IPv6 address to inet6:kube-router-svip manually and running kubectl get nodes from a kubeconfig file set to the IPv6 address of the kube controller node:

#ip6tables -nvL KUBE-ROUTER-SERVICES
Chain KUBE-ROUTER-SERVICES (1 references)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 ACCEPT     58   --  *      *       ::/0                 ::/0                 /* allow icmp echo requests to service IPs */ ipv6-icmptype 128
    0     0 ACCEPT     58   --  *      *       ::/0                 ::/0                 /* allow icmp ttl exceeded messages to service IPs */ ipv6-icmptype 3
    2   256 ACCEPT     58   --  *      *       ::/0                 ::/0                 /* allow icmp destination unreachable messages to service IPs */ ipv6-icmptype 1
    0     0 ACCEPT     0    --  *      *       ::/0                 ::/0                 /* allow input traffic to ipvs services */ match-set inet6:kube-router-svip-prt dst,dst
    2   160 REJECT     0    --  *      *       ::/0                 ::/0                 /* reject all unexpected traffic to service IPs */ ! match-set inet6:kube-router-local-ips dst reject-with icmp6-port-unreachable

So the question becomes, then, how does your node's address end up in that set?

After looking into it some more, I noticed that this set (inet6:kube-router-svip) is getting built by the output of: the ipvs services: https://github.com/cloudnativelabs/kube-router/blob/master/pkg/controllers/proxy/network_services_controller.go#L662

This is probably not the best logic, but it has been this way for years. I'm wondering if there might be something else on your system that is creating an IPVS service for you node's IP address?

Can you run ipvsadm -L -n on your controller node and send me the output?

I'm going to assume that fdcd:0cd1:5dd1:28::1 or fdcd:0cd1:5dd1:28:: will show up in that output.

Unfortunately, ipvs services don't have names, so it might be a bit of a guess to figure out what is creating that. But maybe the port will let us know? Specifically, if that port relates to a kubernetes service, then maybe kube-router is somehow errantly creating it? If not, then maybe some other part of your OS or setup is creating it?

aauren commented 2 months ago

If this is indeed not a kube-router created IPVS service, then the following fix would resolve the issue you're experiencing: https://github.com/cloudnativelabs/kube-router/pull/1699

As part of that pipeline, it should build a test container that maybe you could try out as a shortcut to see if it just magically resolves this issue?

The container created from the pipeline was: cloudnativelabs/kube-router-git:PR-1699

k6av commented 2 months ago

Thanks for the detailed troubleshooting, and sorry for the delay on my part. In principle the node is used only as a Kubernetes host so nothing should be creating any extra IPVS services.

Can you run ipvsadm -L -n on your controller node and send me the output?

IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  [fdae:f9b4:30d1:11::1]:443 rr
  -> [fdae:2ee3:4f01:4::1]:6443   Masq    1      7          0         
TCP  [fdae:f9b4:30d1:11::10]:53 rr
  -> [fdae:f9b4:30d1:10::3]:53    Masq    1      0          0         
TCP  [fdae:f9b4:30d1:11::10]:9153 rr
  -> [fdae:f9b4:30d1:10::3]:9153  Masq    1      0          0         
UDP  [fdae:f9b4:30d1:11::10]:53 rr
  -> [fdae:f9b4:30d1:10::3]:53    Masq    1      0          0

I've just done a quick test with the PR container, and the issue doesn't seem to occur any more. I'm gonna do some more testing on my environment when I can, but it does seem the issue is resolved for now.

aauren commented 2 months ago

Hmm... fdcd:0cd1:5dd1:28::1 definitely appears to be pointing to pods. It looks like it is advertising the same service as [fdcd:5f23:4ab4:11::7b8c]:80.

Is it possible that port 31983 is a NodePort?

From the info in your dump it really looks like something that kube-router coule be creating, which if so, would be a bug. However, it is odd that this patch then resolves the issue you were seeing.

k6av commented 2 months ago

Is it possible that port 31983 is a NodePort?

Good catch, I just realized I forgot to delete a NodePort service I'd used for testing. I've edited my comment with the output sans the NodePort service. Now it seems even stranger; there is no IPVS entry for the node IP.

k6av commented 2 months ago

I'm gonna do some more testing on my environment when I can, but it does seem the issue is resolved for now.

As far as I can tell the issue is resolved, so the PR should be good to merge (assuming everything else is in order). Thanks so much for taking the time to troubleshoot & fix the issue (and also #1698, which was causing issues for me as well).