Frequent timeouts for connections to external services via IPv4

Fabian-K commented 3 months ago

Hi,

I installed k3s on top of ubuntu 24.04 using flannel vxlan (k3s config below). When connecting to external services using IPv4 from within a pod, the connections sometimes succeed and sometimes time out. When connecting to IPv6, it always works. Also, the same connections directly from the host always succeed (both using IPv4 and IPv6).

Unfortunately, my knowledge of networking is quite limited. Do you have any idea what could cause this behavior?

Thanks, Fabian

Connecting to google.com from a pod using IPv4 sometimes fails:

netshoot-prod-686d758bcb-b48s2:~# nc -4 -zv -w1 google.com 443
nc: connect to google.com (142.250.186.78) port 443 (tcp) timed out: Operation in progress

netshoot-prod-686d758bcb-b48s2:~# nc -4 -zv -w1 google.com 443
Connection to google.com (142.250.186.78) 443 port [tcp/https] succeeded!

netshoot-prod-686d758bcb-b48s2:~# nc -4 -zv -w1 google.com 443
Connection to google.com (142.250.186.78) 443 port [tcp/https] succeeded!

netshoot-prod-686d758bcb-b48s2:~# nc -4 -zv -w1 google.com 443
nc: connect to google.com (142.250.186.78) port 443 (tcp) timed out: Operation in progress

Connecting to google.com from a pod using IPv6 always works:

netshoot-prod-686d758bcb-b48s2:~# nc -6 -zv -w1 google.com 443
Connection to google.com (2a00:1450:4001:828::200e) 443 port [tcp/https] succeeded!

netshoot-prod-686d758bcb-b48s2:~# nc -6 -zv -w1 google.com 443
Connection to google.com (2a00:1450:4001:828::200e) 443 port [tcp/https] succeeded!

netshoot-prod-686d758bcb-b48s2:~# nc -6 -zv -w1 google.com 443
Connection to google.com (2a00:1450:4001:828::200e) 443 port [tcp/https] succeeded!

netshoot-prod-686d758bcb-b48s2:~# nc -6 -zv -w1 google.com 443
Connection to google.com (2a00:1450:4001:828::200e) 443 port [tcp/https] succeeded!

Connecting to google.com from the host using IPv4 and IPv6 always works:

ubuntu-server:~# nc -4 -zv google.com 443
Connection to google.com (142.250.185.174) 443 port [tcp/https] succeeded!
ubuntu-server:~# nc -4 -zv google.com 443
Connection to google.com (142.250.185.174) 443 port [tcp/https] succeeded!
ubuntu-server:~# nc -4 -zv google.com 443
Connection to google.com (142.250.185.174) 443 port [tcp/https] succeeded!
ubuntu-server:~# nc -6 -zv -w1 google.com 443
Connection to google.com (2a00:1450:4001:811::200e) 443 port [tcp/https] succeeded!
ubuntu-server:~# nc -6 -zv -w1 google.com 443
Connection to google.com (2a00:1450:4001:811::200e) 443 port [tcp/https] succeeded!
ubuntu-server:~# nc -6 -zv -w1 google.com 443
Connection to google.com (2a00:1450:4001:811::200e) 443 port [tcp/https] succeeded!

Expected Behavior

Reliable connectivity from cluster to external service

Current Behavior

Frequent timeouts when connecting to external services using IPv4

Steps to Reproduce (for bugs)

install k3s on ubuntu 24.04
run nc -4 -zv -w1 google.com 443 from within a pod

Context

Your Environment

Backend used (e.g. vxlan or udp): vxlan
Kubernetes version: v1.30.4+k3s1
Operating System and version: Ubuntu 24.04 LTS

K3s config:

flannel-ipv6-masq: true
cluster-cidr: 10.42.0.0/16,2001:cafe:42::/56
service-cidr: 10.43.0.0/16,2001:cafe:43::/112
disable-network-policy: true

rbrtbnfgl commented 3 months ago

I didn't get this issue on my env. Could you try to check if the traffic at least is rightly forwarded outside of the node when you do the test from the pod? tcpdump -i eth0 -n port 443

Fabian-K commented 3 months ago

Thank you for looking into this!

A successful execution of nc -4 -zv -w1 google.com 443 immediately results in

11:02:40.012163 IP 10.42.0.46.37702 > 142.250.186.78.443: Flags [S], seq 3633463507, win 64860, options [mss 1410,sackOK,TS val 2231993490 ecr 0,nop,wscale 7], length 0
11:02:40.015861 IP 142.250.186.78.443 > 10.42.0.46.37702: Flags [S.], seq 3656415161, ack 3633463508, win 65535, options [mss 1412,sackOK,TS val 4174670192 ecr 2231993490,nop,wscale 8], length 0
11:02:40.015910 IP 10.42.0.46.37702 > 142.250.186.78.443: Flags [.], ack 1, win 507, options [nop,nop,TS val 2231993493 ecr 4174670192], length 0
11:02:40.016076 IP 10.42.0.46.37702 > 142.250.186.78.443: Flags [F.], seq 1, ack 1, win 507, options [nop,nop,TS val 2231993494 ecr 4174670192], length 0
11:02:40.019749 IP 142.250.186.78.443 > 10.42.0.46.37702: Flags [F.], seq 1, ack 2, win 256, options [nop,nop,TS val 4174670195 ecr 2231993494], length 0
11:02:40.019793 IP 10.42.0.46.37702 > 142.250.186.78.443: Flags [.], ack 2, win 507, options [nop,nop,TS val 2231993497 ecr 4174670195], length 0

A failed execution of nc -4 -zv -w1 google.com 443 (I omitted the timeout) produces over time

11:09:03.832261 IP 10.42.0.46.49944 > 142.250.186.78.443: Flags [S], seq 520064118, win 64860, options [mss 1410,sackOK,TS val 2232377310 ecr 0,nop,wscale 7], length 0
11:09:04.863110 IP 10.42.0.46.49944 > 142.250.186.78.443: Flags [S], seq 520064118, win 64860, options [mss 1410,sackOK,TS val 2232378341 ecr 0,nop,wscale 7], length 0
11:09:05.887040 IP 10.42.0.46.49944 > 142.250.186.78.443: Flags [S], seq 520064118, win 64860, options [mss 1410,sackOK,TS val 2232379365 ecr 0,nop,wscale 7], length 0
11:09:06.911110 IP 10.42.0.46.49944 > 142.250.186.78.443: Flags [S], seq 520064118, win 64860, options [mss 1410,sackOK,TS val 2232380389 ecr 0,nop,wscale 7], length 0
11:09:07.935093 IP 10.42.0.46.49944 > 142.250.186.78.443: Flags [S], seq 520064118, win 64860, options [mss 1410,sackOK,TS val 2232381413 ecr 0,nop,wscale 7], length 0
11:09:08.959048 IP 10.42.0.46.49944 > 142.250.186.78.443: Flags [S], seq 520064118, win 64860, options [mss 1410,sackOK,TS val 2232382437 ecr 0,nop,wscale 7], length 0

If I understand it correctly, 142.250.186.78 is one of the IP addresses of the target server (here google). And for some reason sometimes no TCP connection can be established at all? 🤔

rbrtbnfgl commented 3 months ago

Where are you doing the tcpdump? From the pod or from the node? Could you do it from the node? You can omit your public IP address in case.

Fabian-K commented 3 months ago

You are right, that was from the pod. Here are the results from the node. I however made two adjustments: the main network interface is enp5s0. Also, as there is quite some traffic for 443, I adjusted it to 587 and smtp.gmail.com (I originally noticed these issues when sending emails). There is no other traffic for this port on the server => tcpdump -i enp5s0 -n port 587 and nc -4 -zv smtp.gmail.com 587

A successful execution of nc -4 -zv smtp.gmail.com 587immediately results in

15:55:19.684532 IP <PUBLIC IP>.63976 > 74.125.206.108.587: Flags [S], seq 2360491525, win 64860, options [mss 1410,sackOK,TS val 2537310217 ecr 0,nop,wscale 7], length 0
15:55:19.694849 IP 74.125.206.108.587 > <PUBLIC IP>.63976: Flags [S.], seq 3138931281, ack 2360491526, win 65535, options [mss 1412,sackOK,TS val 369829671 ecr 2537310217,nop,wscale 8], length 0
15:55:19.694929 IP <PUBLIC IP>.63976 > 74.125.206.108.587: Flags [.], ack 1, win 507, options [nop,nop,TS val 2537310227 ecr 369829671], length 0
15:55:19.695114 IP <PUBLIC IP>.63976 > 74.125.206.108.587: Flags [F.], seq 1, ack 1, win 507, options [nop,nop,TS val 2537310228 ecr 369829671], length 0
15:55:19.705899 IP 74.125.206.108.587 > <PUBLIC IP>.63976: Flags [.], ack 2, win 256, options [nop,nop,TS val 369829682 ecr 2537310228], length 0
15:55:19.706408 IP 74.125.206.108.587 > <PUBLIC IP>.63976: Flags [F.], seq 1, ack 2, win 256, options [nop,nop,TS val 369829682 ecr 2537310228], length 0
15:55:19.706446 IP <PUBLIC IP>.63976 > 74.125.206.108.587: Flags [.], ack 2, win 507, options [nop,nop,TS val 2537310239 ecr 369829682], length 0

A failed execution of nc -4 -zv smtp.gmail.com 587 produces over time

15:57:06.808798 IP <PUBLIC IP>.3566 > 74.125.206.108.587: Flags [S], seq 582598544, win 64860, options [mss 1410,sackOK,TS val 2537417341 ecr 0,nop,wscale 7], length 0
15:57:07.871077 IP <PUBLIC IP>.3566 > 74.125.206.108.587: Flags [S], seq 582598544, win 64860, options [mss 1410,sackOK,TS val 2537418404 ecr 0,nop,wscale 7], length 0
15:57:08.895179 IP <PUBLIC IP>.3566 > 74.125.206.108.587: Flags [S], seq 582598544, win 64860, options [mss 1410,sackOK,TS val 2537419428 ecr 0,nop,wscale 7], length 0
15:57:09.919173 IP <PUBLIC IP>.3566 > 74.125.206.108.587: Flags [S], seq 582598544, win 64860, options [mss 1410,sackOK,TS val 2537420452 ecr 0,nop,wscale 7], length 0
15:57:10.943151 IP <PUBLIC IP>.3566 > 74.125.206.108.587: Flags [S], seq 582598544, win 64860, options [mss 1410,sackOK,TS val 2537421476 ecr 0,nop,wscale 7], length 0
15:57:11.967181 IP <PUBLIC IP>.3566 > 74.125.206.108.587: Flags [S], seq 582598544, win 64860, options [mss 1410,sackOK,TS val 2537422500 ecr 0,nop,wscale 7], length 0
15:57:14.015171 IP <PUBLIC IP>.3566 > 74.125.206.108.587: Flags [S], seq 582598544, win 64860, options [mss 1410,sackOK,TS val 2537424548 ecr 0,nop,wscale 7], length 0
15:57:18.047198 IP <PUBLIC IP>.3566 > 74.125.206.108.587: Flags [S], seq 582598544, win 64860, options [mss 1410,sackOK,TS val 2537428580 ecr 0,nop,wscale 7], length 0

Not sure if that is relevant, but the server is a dedicated server hosted by Hetzner

rbrtbnfgl commented 3 months ago

I don't know about Hetzner but the traffic seems rightly forwarded outside the node by flannel. You aren't getting any reply from internet. Are there any configuration on the provider virtual network that could drop the traffic? I don't know probably some rules to avoid a DDoS attack.

Fabian-K commented 3 months ago

Hmm... nothing that I´m aware of. What bugs me the most is that it is 100% reliable when running directly from the node. Anything from the provider side would also affect this, right? Only when running from within a pod, it starts to fail. It also only fails from the pod using IPv4, using IPv6 is also there 100% reliable. 🤔

I did a small test with 100 attempts: From Node: nc -4 -zv google.com 443 100% reliable (100/100) From Node: nc -6 -zv google.com 443 100% reliable (100/100)

From Pod: nc -4 -zv google.com 443 ~50% reliable (53/100) From Pod: nc -6 -zv google.com 443 100% reliable (100/100)

rbrtbnfgl commented 3 months ago

Could you try this from the node? ethtool --offload eth0 rx off tx off ethtool -K eth0 gso off with the name of the interface instead of eth0

Fabian-K commented 3 months ago

I think I found the reason, however, I can´t explain it fully yet. There is a firewall on the provider side that by default does not filter ipv6. This explains why it always works for IPv6 - both on the node and from the pod.

Next to some rules like only allowing incoming 80 and 443, this also contains by default an entry called "TCP established" with version ipv4, protocol TCP, target port 32768-65535, TCP-Flags ack -> action accept. As soon as this entry is present, I see the behavior as described.

When I temporarily replace this with something like version ipv4, protocol TCP, target port 0-65535, TCP-Flags ack -> action accept, the issue is resolved.

Is a different ephemeral port range than 32768-65535 used when the traffic comes from the pod via flannel? 🤔

rbrtbnfgl commented 3 months ago

For the traffic from the pods Flannel is only configuring a basic NAT with iptables and MASQUERADE.

Fabian-K commented 3 months ago

In that context I found https://github.com/canonical/microk8s/issues/3909 describing the same issue with calico. It looks like that for some reason, a wider port range for ephemeral ports is used. 1024-65535 seem to work - matching https://datatracker.ietf.org/doc/html/rfc6056 🤔.

I currently don´t know where to follow up on this but at least it does not seem to be an issue exclusively with flannel.

Thanks a lot @rbrtbnfgl for the support! 🙏

flannel-io / flannel