cilium / cilium

eBPF-based Networking, Security, and Observability
https://cilium.io
Apache License 2.0
19.7k stars 2.88k forks source link

Geneve DSR load balancer mode does not work for TCP #33609

Open stevefan1999-personal opened 2 months ago

stevefan1999-personal commented 2 months ago

Is there an existing issue for this?

What happened?

I just made a new Node IPAM load balancer using Geneve in DSR mode as advertised in the documentation. I noticed that if I have externalTrafficPolicy: Local, then I won't have any incoming traffic at all, although I think this would be a separate issue.

If I change it to externalTrafficPolicy: Cluster, there is a connection, but the whole connection stuck without any I/O until connection reset.

I'm running Cilium behind a Wireguard connection because my homelab does not have a public IP that can be listened to. So, I made a Wireguard network to my VPS and then I used the CiliumLoadBalancerIPPool to expose the mail server as a load balancer using that public IP from the VPS (plus the Egress policy set to use that VPS to have complete traffic relay), I have even set it up this way correctly:

apiVersion: cilium.io/v2alpha1
kind: CiliumLoadBalancerIPPool
metadata:
  name: vps-ip
spec:
  blocks:
    - cidr: 206.XXX.YYY.ZZZ/32

And if the load balancer is in SNAT mode, I can access 206.XXX.YYY.ZZZ to my mail server, but obviously I won't get the correct client IP address rendering the email server in an open relay setup (since it is resolved as internal IP). I'm not sure what to do now.

Cilium Version

1.15.6

Kernel Version

Talos Linux 1.6.5

Kubernetes Version

v1.30.0

Regression

No response

Sysdump

No response

Relevant log output

No response

Anything else?

Helm config

kubeProxyReplacement: true
bandwidthManager:
  bbr: true
  enabled: true
bpf:
  masquerade: true
  tproxy: true
cluster:
  name: kubehyper
egressGateway:
  enabled: true
gatewayAPI:
  enabled: false
hubble:
  relay:
    enabled: true
  ui:
    enabled: true
ingressController:
  enabled: false
ipMasqAgent:
  enabled: true
ipv6:
  enabled: false
nodeIPAM:
  enabled: true
sctp:
  enabled: true
tunnelProtocol: geneve
k8sServiceHost: localhost
k8sServicePort: 7445
loadBalancer:
  algorithm: maglev
  dsrDispatch: geneve
  mode: snat # depends, okay with snat, fails with dsr
routingMode: tunnel
operator:
  replicas: 3
securityContext:
  capabilities:
    ciliumAgent: [CHOWN, KILL, NET_ADMIN, NET_RAW, IPC_LOCK, SYS_ADMIN, SYS_RESOURCE, DAC_OVERRIDE, FOWNER, SETGID, SETUID]
    cleanCiliumState: [NET_ADMIN, SYS_ADMIN, SYS_RESOURCE]
cgroup:
  autoMount:
    enabled: false
  hostRoot: /sys/fs/cgroup

Cilium Users Document

Code of Conduct

stevefan1999-personal commented 1 month ago

My game server experiment shows that all my servers based on UDP won't have this issue, and the client source IP address is correctly resolved under Geneve DSR mode. It is TCP mode however having the connection stuck for some reason.

stevefan1999-personal commented 1 month ago

"daemon creation failed: BPF masquerading to route source (--enable-masquerade-to-route-source=\"true\") currently not supported with BPF-based masquerading (--enable-bpf-masquerade=\"true\")"

Huh...

stevefan1999-personal commented 1 month ago

I did an IRL check with WireShark, here's the request to LB with DSR image Here's the request to LB with SNAT image

stevefan1999-personal commented 1 month ago

SNAT

tcpdump of cilium_geneve on the node that hosted the mail server pod:

18:09:19.369576 IP (tos 0x0, ttl 118, id 44016, offset 0, flags [DF], proto TCP (6), length 52)
    10.243.139.190.62327 > 10.240.9.152.submissions: Flags [S], cksum 0xa96d (correct), seq 1720110851, win 62258, options [mss 536,nop,wscale 8,nop,nop,sackOK], length 0
18:09:19.369628 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 52)
    10.240.9.152.submissions > 10.243.139.190.62327: Flags [S.], cksum 0xab5f (incorrect -> 0xc4e0), seq 645902441, ack 1720110852, win 64860, options [mss 1410,nop,nop,sackOK,nop,wscale 7], length 0
18:09:19.387951 IP (tos 0x0, ttl 118, id 44017, offset 0, flags [DF], proto TCP (6), length 40)
    10.243.139.190.62327 > 10.240.9.152.submissions: Flags [.], cksum 0xfedc (correct), seq 1, ack 1, win 1025, length 0
18:09:19.388594 IP (tos 0x0, ttl 118, id 44018, offset 0, flags [DF], proto TCP (6), length 122)
    10.243.139.190.62327 > 10.240.9.152.submissions: Flags [P.], cksum 0xdbf7 (correct), seq 1:83, ack 1, win 1025, length 82
18:09:19.388628 IP (tos 0x0, ttl 64, id 17689, offset 0, flags [DF], proto TCP (6), length 40)
    10.240.9.152.submissions > 10.243.139.190.62327: Flags [.], cksum 0xab53 (incorrect -> 0x0091), seq 1, ack 83, win 507, length 0
18:09:19.400408 IP (tos 0x0, ttl 64, id 17690, offset 0, flags [DF], proto TCP (6), length 40)
    10.240.9.152.submissions > 10.243.139.190.62327: Flags [F.], cksum 0xab53 (incorrect -> 0x0090), seq 1, ack 83, win 507, length 0
18:09:19.414582 IP (tos 0x0, ttl 118, id 44019, offset 0, flags [DF], proto TCP (6), length 40)
    10.243.139.190.62327 > 10.240.9.152.submissions: Flags [.], cksum 0xfe89 (correct), seq 83, ack 2, win 1025, length 0
18:09:19.416129 IP (tos 0x0, ttl 118, id 44020, offset 0, flags [DF], proto TCP (6), length 40)
    10.243.139.190.62327 > 10.240.9.152.submissions: Flags [F.], cksum 0xfe88 (correct), seq 83, ack 2, win 1025, length 0
18:09:19.416171 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
    10.240.9.152.submissions > 10.243.139.190.62327: Flags [.], cksum 0x008f (correct), seq 2, ack 84, win 507, length 0

tcpdump of cilium_geneve on the node that the load balancer is hosted that connects to the target pod:

reading from file -, link-type EN10MB (Ethernet), snapshot length 4096
18:11:27.740698 IP (tos 0x0, ttl 118, id 44021, offset 0, flags [DF], proto TCP (6), length 52)
    10.243.139.190.62787 > 10.240.9.152.submissions: Flags [S], cksum 0x103f (correct), seq 2353545380, win 62258, options [mss 536,nop,wscale 8,nop,nop,sackOK], length 0
18:11:27.746795 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 52)
    10.240.9.152.submissions > 10.243.139.190.62787: Flags [S.], cksum 0xa81f (correct), seq 1827794313, ack 2353545381, win 64860, options [mss 1410,nop,nop,sackOK,nop,wscale 7], length 0
18:11:27.752287 IP (tos 0x0, ttl 118, id 44022, offset 0, flags [DF], proto TCP (6), length 40)
    10.243.139.190.62787 > 10.240.9.152.submissions: Flags [.], cksum 0xe21b (correct), seq 1, ack 1, win 1025, length 0
18:11:27.753682 IP (tos 0x0, ttl 118, id 44023, offset 0, flags [DF], proto TCP (6), length 122)
    10.243.139.190.62787 > 10.240.9.152.submissions: Flags [P.], cksum 0xbf36 (correct), seq 1:83, ack 1, win 1025, length 82
18:11:27.759755 IP (tos 0x0, ttl 64, id 39092, offset 0, flags [DF], proto TCP (6), length 40)
    10.240.9.152.submissions > 10.243.139.190.62787: Flags [.], cksum 0xe3cf (correct), seq 1, ack 83, win 507, length 0
18:11:27.773097 IP (tos 0x0, ttl 64, id 39093, offset 0, flags [DF], proto TCP (6), length 40)
    10.240.9.152.submissions > 10.243.139.190.62787: Flags [F.], cksum 0xe3ce (correct), seq 1, ack 83, win 507, length 0
18:11:27.778314 IP (tos 0x0, ttl 118, id 44024, offset 0, flags [DF], proto TCP (6), length 40)
    10.243.139.190.62787 > 10.240.9.152.submissions: Flags [.], cksum 0xe1c8 (correct), seq 83, ack 2, win 1025, length 0
18:11:27.779591 IP (tos 0x0, ttl 118, id 44025, offset 0, flags [DF], proto TCP (6), length 40)
    10.243.139.190.62787 > 10.240.9.152.submissions: Flags [F.], cksum 0xe1c7 (correct), seq 83, ack 2, win 1025, length 0
18:11:27.787552 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
    10.240.9.152.submissions > 10.243.139.190.62787: Flags [.], cksum 0xe3cd (correct), seq 2, ack 84, win 507, length 0
stevefan1999-personal commented 1 month ago

Geneve DSR

tcpdump of cilium_geneve on the node that hosted the mail server pod:

reading from file -, link-type EN10MB (Ethernet), snapshot length 4096
18:18:49.788674 IP (tos 0x0, ttl 119, id 44112, offset 0, flags [DF], proto TCP (6), length 52)
    <MY OWN ISP IP>.ctinets.com.61579 > 10.240.9.152.submissions: Flags [S], cksum 0xf8dd (correct), seq 3162704892, win 62258, options [mss 536,nop,wscale 8,nop,nop,sackOK], length 0

tcpdump of cilium_geneve on the node that the load balancer is hosted that connects to the target pod:

reading from file -, link-type EN10MB (Ethernet), snapshot length 4096
18:18:49.755979 IP (tos 0x0, ttl 119, id 44112, offset 0, flags [DF], proto TCP (6), length 52)
    <MY OWN ISP IP>.ctinets.com.61579 > 10.240.9.152.submissions: Flags [S], cksum 0xf8dd (correct), seq 3162704892, win 62258, options [mss 536,nop,wscale 8,nop,nop,sackOK], length 0

Looks like there is no respond...? The handshakes aren't even completed though.

stevefan1999-personal commented 1 month ago

Is the MTU really correct for TCP connection in tunneled Geneve DSR though...but what we can tell here is that the TCP connection did not have a valid return path, my original source IP was not taken into consideration when replying.

stevefan1999-personal commented 1 month ago

There's more, I have checked on the host interface of the node that hosted the pod (this time a game server on port 25565), and the load balancer is still using DSR mode, here's what I got:

reading from file -, link-type EN10MB (Ethernet), snapshot length 4096
19:17:58.928626 IP (tos 0x0, ttl 62, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    <Load Balancer Server IP>.25565 > <Third Party Client IP>.58468: Flags [S.], cksum 0x28da (correct), seq 440089612, ack 4104243680, win 64308, options [mss 1410,sackOK,TS val 110442570 ecr 1520752668,nop,wscale 7], length 0
19:18:02.796539 IP (tos 0x0, ttl 62, id 0, offset 0, flags [DF], proto TCP (6), length 48)
    <Load Balancer Server IP>.25565 > 51.75.66.201.1001: Flags [S.], cksum 0x9925 (correct), seq 2743871116, ack 2717961258, win 64860, options [mss 1410,nop,nop,sackOK], length 0
19:18:07.010743 IP (tos 0x0, ttl 62, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    <Load Balancer Server IP>.25565 > <Third Party Client IP>.58468: Flags [S.], cksum 0x0948 (correct), seq 440089612, ack 4104243680, win 64308, options [mss 1410,sackOK,TS val 110450652 ecr 1520752668,nop,wscale 7], length 0
19:18:08.908628 IP (tos 0x0, ttl 62, id 0, offset 0, flags [DF], proto TCP (6), length 48)
    <Load Balancer Server IP>.25565 > 51.75.66.201.1001: Flags [S.], cksum 0x9925 (correct), seq 2743871116, ack 2717961258, win 64860, options [mss 1410,nop,nop,sackOK], length 0
19:18:13.218923 IP (tos 0x0, ttl 62, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    <Load Balancer Server IP>.25565 > <Third Party Client IP>.38992: Flags [S.], cksum 0xcc56 (correct), seq 708622976, ack 4013035055, win 64308, options [mss 1410,sackOK,TS val 110456860 ecr 1520794508,nop,wscale 7], length 0
19:18:14.224595 IP (tos 0x0, ttl 62, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    <Load Balancer Server IP>.25565 > <Third Party Client IP>.38992: Flags [S.], cksum 0xc868 (correct), seq 708622976, ack 4013035055, win 64308, options [mss 1410,sackOK,TS val 110457866 ecr 1520794508,nop,wscale 7], length 0
19:18:16.294105 IP (tos 0x0, ttl 62, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    <Load Balancer Server IP>.25565 > <Third Party Client IP>.38992: Flags [S.], cksum 0xc053 (correct), seq 708622976, ack 4013035055, win 64308, options [mss 1410,sackOK,TS val 110459935 ecr 1520794508,nop,wscale 7], length 0
19:18:17.100594 IP (tos 0x0, ttl 62, id 0, offset 0, flags [DF], proto TCP (6), length 48)
    <Load Balancer Server IP>.25565 > 51.75.66.201.1001: Flags [S.], cksum 0x9925 (correct), seq 2743871116, ack 2717961258, win 64860, options [mss 1410,nop,nop,sackOK], length 0
19:18:17.313813 IP (tos 0x0, ttl 62, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    <Load Balancer Server IP>.25565 > <Third Party Client IP>.38992: Flags [S.], cksum 0xbc57 (correct), seq 708622976, ack 4013035055, win 64308, options [mss 1410,sackOK,TS val 110460955 ecr 1520794508,nop,wscale 7], length 0
19:18:18.337976 IP (tos 0x0, ttl 62, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    <Load Balancer Server IP>.25565 > <Third Party Client IP>.38992: Flags [S.], cksum 0xb857 (correct), seq 708622976, ack 4013035055, win 64308, options [mss 1410,sackOK,TS val 110461979 ecr 1520794508,nop,wscale 7], length 0
19:18:20.353849 IP (tos 0x0, ttl 62, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    <Load Balancer Server IP>.25565 > <Third Party Client IP>.38992: Flags [S.], cksum 0xb077 (correct), seq 708622976, ack 4013035055, win 64308, options [mss 1410,sackOK,TS val 110463995 ecr 1520794508,nop,wscale 7], length 0
19:18:22.380556 IP (tos 0x0, ttl 62, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    <Load Balancer Server IP>.25565 > <Third Party Client IP>.38992: Flags [S.], cksum 0xa88c (correct), seq 708622976, ack 4013035055, win 64308, options [mss 1410,sackOK,TS val 110466022 ecr 1520794508,nop,wscale 7], length 0
19:18:23.244612 IP (tos 0x0, ttl 62, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    <Load Balancer Server IP>.25565 > <Third Party Client IP>.58468: Flags [S.], cksum 0xc9dd (correct), seq 440089612, ack 4104243680, win 64308, options [mss 1410,sackOK,TS val 110466886 ecr 1520752668,nop,wscale 7], length 0
19:18:24.418069 IP (tos 0x0, ttl 62, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    <Load Balancer Server IP>.25565 > <Third Party Client IP>.38992: Flags [S.], cksum 0xa097 (correct), seq 708622976, ack 4013035055, win 64308, options [mss 1410,sackOK,TS val 110468059 ecr 1520794508,nop,wscale 7], length 0

This is pcaped from the load balancer server:

reading from file -, link-type EN10MB (Ethernet), snapshot length 4096
19:18:01.783870 IP (tos 0x0, ttl 46, id 1, offset 0, flags [DF], proto TCP (6), length 48)
    51.75.66.201.1001 > <Load Balancer Server IP>.25565: Flags [S], cksum 0xe079 (correct), seq 2717961257, win 32768, options [mss 1460,nop,nop,sackOK], length 0
19:18:07.007615 IP (tos 0x0, ttl 55, id 42618, offset 0, flags [DF], proto TCP (6), length 60)
    <Third Party Client IP>.58468 > <Load Balancer Server IP>.25565: Flags [S], cksum 0x32f3 (correct), seq 4104243679, win 64240, options [mss 1460,sackOK,TS val 1520788300 ecr 0,nop,wscale 7], length 0
19:18:14.238911 IP (tos 0x0, ttl 55, id 5520, offset 0, flags [DF], proto TCP (6), length 60)
    <Third Party Client IP>.38992 > <Load Balancer Server IP>.25565: Flags [S], cksum 0x23e8 (correct), seq 4013035054, win 64240, options [mss 1460,sackOK,TS val 1520795532 ecr 0,nop,wscale 7], length 0
19:18:17.310808 IP (tos 0x0, ttl 55, id 5523, offset 0, flags [DF], proto TCP (6), length 60)
    <Third Party Client IP>.38992 > <Load Balancer Server IP>.25565: Flags [S], cksum 0x17e8 (correct), seq 4013035054, win 64240, options [mss 1460,sackOK,TS val 1520798604 ecr 0,nop,wscale 7], length 0
19:18:18.334903 IP (tos 0x0, ttl 55, id 5524, offset 0, flags [DF], proto TCP (6), length 60)
    <Third Party Client IP>.38992 > <Load Balancer Server IP>.25565: Flags [S], cksum 0x13e8 (correct), seq 4013035054, win 64240, options [mss 1460,sackOK,TS val 1520799628 ecr 0,nop,wscale 7], length 0
19:18:24.414988 IP (tos 0x0, ttl 55, id 5526, offset 0, flags [DF], proto TCP (6), length 60)
    <Third Party Client IP>.38992 > <Load Balancer Server IP>.25565: Flags [S], cksum 0xfc27 (correct), seq 4013035054, win 64240, options [mss 1460,sackOK,TS val 1520805708 ecr 0,nop,wscale 7], length 0
19:18:32.607119 IP (tos 0x0, ttl 55, id 5527, offset 0, flags [DF], proto TCP (6), length 60)
    <Third Party Client IP>.38992 > <Load Balancer Server IP>.25565: Flags [S], cksum 0xdc27 (correct), seq 4013035054, win 64240, options [mss 1460,sackOK,TS val 1520813900 ecr 0,nop,wscale 7], length 0

I ran curl <Load Balancer IP>:25565 -v on the third-party client that has direct IP rather than masqueraded behind a home router.

Please ignore 51.75.66.201

stevefan1999-personal commented 1 month ago

This is looking sus

This is on my third party client server:

root@backup:~# tcpdump -vvvnnn '(src <Homelab Network IP> or dst <Homelab Network IP> or src <Load Balancer Server IP> or dst <Load Balancer Server IP>) and not (port 22 or port 8007 or port 3260)'
tcpdump: listening on enp1s0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
20:10:16.352980 IP (tos 0x0, ttl 64, id 25731, offset 0, flags [DF], proto TCP (6), length 60)
    <Third Party Client IP>.59984 > <Load Balancer Server IP>.25565: Flags [S], cksum 0x5405 (incorrect -> 0x8d20), seq 1162860748, win 64240, options [mss 1460,sackOK,TS val 746509135 ecr 0,nop,wscale 7], length 0
20:10:17.313553 IP (tos 0x0, ttl 64, id 55941, offset 0, flags [DF], proto TCP (6), length 60)
    <Third Party Client IP>.33338 > <Load Balancer Server IP>.25565: Flags [S], cksum 0x5405 (incorrect -> 0xf130), seq 1755166148, win 64240, options [mss 1460,sackOK,TS val 746510095 ecr 0,nop,wscale 7], length 0
20:10:18.336970 IP (tos 0x0, ttl 64, id 55942, offset 0, flags [DF], proto TCP (6), length 60)
    <Third Party Client IP>.33338 > <Load Balancer Server IP>.25565: Flags [S], cksum 0x5405 (incorrect -> 0xed30), seq 1755166148, win 64240, options [mss 1460,sackOK,TS val 746511119 ecr 0,nop,wscale 7], length 0
20:10:19.360961 IP (tos 0x0, ttl 64, id 55943, offset 0, flags [DF], proto TCP (6), length 60)
    <Third Party Client IP>.33338 > <Load Balancer Server IP>.25565: Flags [S], cksum 0x5405 (incorrect -> 0xe930), seq 1755166148, win 64240, options [mss 1460,sackOK,TS val 746512143 ecr 0,nop,wscale 7], length 0
20:10:20.384963 IP (tos 0x0, ttl 64, id 55944, offset 0, flags [DF], proto TCP (6), length 60)
    <Third Party Client IP>.33338 > <Load Balancer Server IP>.25565: Flags [S], cksum 0x5405 (incorrect -> 0xe530), seq 1755166148, win 64240, options [mss 1460,sackOK,TS val 746513167 ecr 0,nop,wscale 7], length 0
20:10:21.408969 IP (tos 0x0, ttl 64, id 55945, offset 0, flags [DF], proto TCP (6), length 60)
    <Third Party Client IP>.33338 > <Load Balancer Server IP>.25565: Flags [S], cksum 0x5405 (incorrect -> 0xe130), seq 1755166148, win 64240, options [mss 1460,sackOK,TS val 746514191 ecr 0,nop,wscale 7], length 0
20:10:22.432967 IP (tos 0x0, ttl 64, id 55946, offset 0, flags [DF], proto TCP (6), length 60)
    <Third Party Client IP>.33338 > <Load Balancer Server IP>.25565: Flags [S], cksum 0x5405 (incorrect -> 0xdd30), seq 1755166148, win 64240, options [mss 1460,sackOK,TS val 746515215 ecr 0,nop,wscale 7], length 0
20:10:24.448970 IP (tos 0x0, ttl 64, id 55947, offset 0, flags [DF], proto TCP (6), length 60)
    <Third Party Client IP>.33338 > <Load Balancer Server IP>.25565: Flags [S], cksum 0x5405 (incorrect -> 0xd550), seq 1755166148, win 64240, options [mss 1460,sackOK,TS val 746517231 ecr 0,nop,wscale 7], length 0
20:10:28.640965 IP (tos 0x0, ttl 64, id 55948, offset 0, flags [DF], proto TCP (6), length 60)
    <Third Party Client IP>.33338 > <Load Balancer Server IP>.25565: Flags [S], cksum 0x5405 (incorrect -> 0xc4f0), seq 1755166148, win 64240, options [mss 1460,sackOK,TS val 746521423 ecr 0,nop,wscale 7], length 0
20:10:36.832978 IP (tos 0x0, ttl 64, id 55949, offset 0, flags [DF], proto TCP (6), length 60)
    <Third Party Client IP>.33338 > <Load Balancer Server IP>.25565: Flags [S], cksum 0x5405 (incorrect -> 0xa4f0), seq 1755166148, win 64240, options [mss 1460,sackOK,TS val 746529615 ecr 0,nop,wscale 7], length 0

This is on my home router:

20:10:17.319418 IP (tos 0x0, ttl 62, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    <Load Balancer Server IP>.25565 > <Third Party Client IP>.33338: Flags [S.], cksum 0x23b0 (correct), seq 2412780328, ack 1755166149, win 64308, options [mss 1410,sackOK,TS val 113580960 ecr 746510095,nop,wscale 7], length 0
20:10:18.342334 IP (tos 0x0, ttl 62, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    <Load Balancer Server IP>.25565 > <Third Party Client IP>.33338: Flags [S.], cksum 0x1fb1 (correct), seq 2412780328, ack 1755166149, win 64308, options [mss 1410,sackOK,TS val 113581983 ecr 746510095,nop,wscale 7], length 0
20:10:19.366294 IP (tos 0x0, ttl 62, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    <Load Balancer Server IP>.25565 > <Third Party Client IP>.33338: Flags [S.], cksum 0x1bb1 (correct), seq 2412780328, ack 1755166149, win 64308, options [mss 1410,sackOK,TS val 113583007 ecr 746510095,nop,wscale 7], length 0
20:10:20.393482 IP (tos 0x0, ttl 62, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    <Load Balancer Server IP>.25565 > <Third Party Client IP>.33338: Flags [S.], cksum 0x17ae (correct), seq 2412780328, ack 1755166149, win 64308, options [mss 1410,sackOK,TS val 113584034 ecr 746510095,nop,wscale 7], length 0
20:10:21.414591 IP (tos 0x0, ttl 62, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    <Load Balancer Server IP>.25565 > <Third Party Client IP>.33338: Flags [S.], cksum 0x13b1 (correct), seq 2412780328, ack 1755166149, win 64308, options [mss 1410,sackOK,TS val 113585055 ecr 746510095,nop,wscale 7], length 0
20:10:22.438581 IP (tos 0x0, ttl 62, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    <Load Balancer Server IP>.25565 > <Third Party Client IP>.33338: Flags [S.], cksum 0x0fb0 (correct), seq 2412780328, ack 1755166149, win 64308, options [mss 1410,sackOK,TS val 113586080 ecr 746510095,nop,wscale 7], length 0
20:10:23.472810 IP (tos 0x0, ttl 62, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    <Load Balancer Server IP>.25565 > <Third Party Client IP>.33338: Flags [S.], cksum 0x0ba6 (correct), seq 2412780328, ack 1755166149, win 64308, options [mss 1410,sackOK,TS val 113587114 ecr 746510095,nop,wscale 7], length 0

But I probably think I got the reason why I'm seeing this, but I'm not closing this issue until it is confirmed...

stevefan1999-personal commented 1 month ago

I would like to keep this open since DSR does work with UDP but not TCP which sounds quite strange on a deeper level, since DSR should not work if either case with my homelab setup.

To summarize:

  1. Tunneled Geneve DSR does not work with TCP if the destination server is behind NAT, since the three-way handshake is not observed to complete at all
  2. Tunneled Geneve DSR does work with UDP, but that's because UDP employs a fire-and-forgot model without handshake. So generally anyone can send data to the source client IP:port without problem even when behind NAT.
  3. Something might be wrong in eBPF datapath for load balancer handling TCP stuff
  4. SNAT obviously works since the endpoint has to go back to the load balancer source, rather than going to the client directly.
  5. It is also possible that your real server IP behind the load balancer, would leak on IP level if you use DSR.
  6. LB in Geneve DSR mode probably works with native routing (?) It shouldn't be working though
stevefan1999-personal commented 1 month ago

https://github.com/cilium/cilium/blob/b0a69f3cbcb1b05bf3eca677b4cabee55b260ec4/bpf/lib/nodeport.h#L2311-L2340

I'm not sure how to attach a breakpoint here...