cilium / cilium

eBPF-based Networking, Security, and Observability
https://cilium.io
Apache License 2.0
19.16k stars 2.78k forks source link

bpf: recreate CT entry if proxy_redirect is stale for non-tcp #33222

Closed ysksuzuki closed 5 days ago

ysksuzuki commented 1 week ago

This commit fixes the issue that datapath erroneously redirects (or doesn't redirect) the reply packets to the proxy if the packet hits the stale CT entry.

The PR #32653 fixed the issue for TCP by having __ct_lookup return CT_NEW if the packet hits a closing stale entry so that the caller can recreate an entry to update the proxy_redirect flag.

This commit lets datapath recreate an entry for non-TCP in the similar case to update the proxy_redirect flag.

This problem can occur, for example, when using dns-proxy.

kubectl -n cilium-test exec client3-7557dd665c-csxh6 -- curl --silent --fail --show-error --connect-timeout 2 --max-time 10 -4 http://echo-external-node.cilium-test.svc.cluster.local:8080/client-ip
curl: (28) Resolving timed out after 2001 milliseconds
command terminated with exit code 28

// The reply packet from core-dns is dropped
kubectl exec debug-8sx77 -- tcpdump -nl -i any udp and host 10.244.1.159 and port 53
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
03:43:28.350752 lxc7273688205e1 In  IP 10.244.1.159.34716 > 10.244.0.36.53: 1292+ A? echo-external-node.cilium-test.svc.cluster.local.cilium-test.svc.cluster.local. (96)
03:43:28.350883 cilium_vxlan Out IP 10.244.1.159.34716 > 10.244.0.36.53: 1292+ A? echo-external-node.cilium-test.svc.cluster.local.cilium-test.svc.cluster.local. (96)
03:43:28.351919 cilium_vxlan P   IP 10.244.0.36.53 > 10.244.1.159.34716: 1292 NXDomain*- 0/1/0 (189)
03:43:28.352224 lxc7273688205e1 In  IP 10.244.1.159.48844 > 10.244.0.108.53: 51484+ A? echo-external-node.cilium-test.svc.cluster.local.svc.cluster.local. (84)
03:43:28.352295 cilium_vxlan Out IP 10.244.1.159.48844 > 10.244.0.108.53: 51484+ A? echo-external-node.cilium-test.svc.cluster.local.svc.cluster.local. (84)
03:43:28.353106 cilium_vxlan P   IP 10.244.0.108.53 > 10.244.1.159.48844: 51484 NXDomain*- 0/1/0 (177)

// Stale CT entry with ProxyRedirect flag
kubectl -n kube-system exec cilium-f65hc -- cilium bpf ct list global 
UDP OUT 10.244.1.159:34716 -> 10.244.0.36:53 expires=1165307 Packets=0 Bytes=0 RxFlagsSeen=0x00 LastRxReport=1165247 TxFlagsSeen=0x00 LastTxReport=1165247 Flags=0x0000 [ ] RevNAT=0 SourceSecurityID=5242 IfIndex=0 
UDP OUT 10.244.1.159:48844 -> 10.244.0.108:53 expires=1165307 Packets=0 Bytes=0 RxFlagsSeen=0x00 LastRxReport=1165247 TxFlagsSeen=0x00 LastTxReport=1165247 Flags=0x0040 [ ProxyRedirect ] RevNAT=0 SourceSecurityID=5242 IfIndex=0 
Recreate CT entries for non-TCP to fix L7 proxy redirect failures.
ysksuzuki commented 1 week ago

/test

ysksuzuki commented 1 week ago

/test

julianwiedmann commented 6 days ago

@jrajahalme if you're in agreement, please make sure to also add the ready-to-merge :pray: