angelnu / pod-gateway

Container image used to set a pod gateway
Apache License 2.0
56 stars 29 forks source link

Fix pod-gateway when using Cilium #52

Closed b-tuma closed 1 month ago

b-tuma commented 9 months ago

Description of the change

When using Cilium, pod-gateway clients won't be able to reach the gateway because Cilium blocks the encapsulated traffic. Whoever, if we explicitly set a destination port for the tunneling, we can make Cilium treat our encapsulated traffic as usual inter-pod UDP traffic.

The changes needed to get an existing deployment working with Cilium are:

I've allowed the user to choose whatever port they want but IANA recommends 4789 as the expected UDP port for VXLAN, some documentation says that when using dstport 0 it fallback to the "default" port (which I assume is 4789) but after some testing that's not the behavior I've seen, seems like it might depend on the implementation as in the past it was port 8472.

Benefits

Allows using pod-gateway with Cilium :)

Possible drawbacks

To avoid breaking current deployments of the pod-gateway the VXLAN_PORT variable is falling back to "0" if the env variable is unset. Even though I've added the VXLAN_PORT to settings.sh, it is possible that people is mounting that file as a configmap (like myself) and could miss the new setting.

Applicable issues

Additional information For reference, this is what happens in the gateway-init sidecar when using cilium and dstport 0

+ ping -c 1 172.16.0.1
PING 172.16.0.1 (172.16.0.1): 56 data bytes

--- 172.16.0.1 ping statistics ---
1 packets transmitted, 0 packets received, 100% packet loss

And this is what happens to gateway-admision-controller if you add the VXLAN_PORT and container port but not the service:

{"addr":":8080","app":"gateway-admision-controller","http-server":"webhooks","level":"info","lib":"kubewebhook","msg":"Command config is config.CmdConfig{Debug:false, Development:false, SetGatewayDefault:false, WebhookListenAddr:\":8080\", MetricsListenAddr:\"\", MetricsPath:\"\", TLSCertFilePath:\"/tls/tls.crt\", TLSKeyFilePath:\"/tls/tls.key\", Gateway:\"vpn-gateway.vpn-gateway.svc.cluster.local\", DNS:\"172.16.0.1\", DNSPolicy:\"None\", SetGatewayLabel:\"vpn\", SetGatewayLabelValue:\"\", SetGatewayAnnotation:\"vpn\", SetGatewayAnnotationValue:\"\", InitImage:\"ghcr.io/b-tuma/pod-gateway-udp:v1.11.0\", InitImagePullPol:\"IfNotPresent\", InitCmd:\"/bin/client_init.sh\", InitMountPoint:\"/config\", SidecarImage:\"ghcr.io/b-tuma/pod-gateway-udp:v1.11.0\", SidecarImagePullPol:\"IfNotPresent\", SidecarCmd:\"/bin/client_sidecar.sh\", SidecarMountPoint:\"/config\", ConfigmapName:\"vpn-gateway\", VxlanPort:0}","service":"webhook-handler","time":"2024-03-04T22:10:11Z","version":"dev","webhook":"gatewayPodMutator"}
error running app: could not create webhooks handler: could not register routes on handler: error creating webhook mutator: lookup vpn-gateway.vpn-gateway.svc.cluster.local on 172.30.0.10:53: no such host
TheAceMan commented 7 months ago

Still not able to get it working with your changes. I recently moved to Cilium and the pod-gateway stopped working, was working before with flannel. This is the tail of the gateway-init log. It has the right pod IP and believe I have configured the port and service as you outlined. This ping occurs just before your changes in client_init.sh so not sure if this is a separate issue or related to cilium. Was there any cilium configuration required?


+ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
13774: eth0@if13775: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 7a:5f:3b:7f:6a:c8 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.42.0.123/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::785f:3bff:fe7f:6ac8/64 scope link
       valid_lft forever preferred_lft forever
+ ip route
10.42.0.0/16 via 10.42.0.117 dev eth0
10.42.0.117 dev eth0 scope link
10.42.0.146 via 10.42.0.117 dev eth0
10.43.0.0/16 via 10.42.0.117 dev eth0
192.168.0.0/21 via 10.42.0.117 dev eth0
+ ping -c 1 10.42.0.146
PING 10.42.0.146 (10.42.0.146): 56 data bytes

--- 10.42.0.146 ping statistics ---
1 packets transmitted, 0 packets received, 100% packet loss
a-bran commented 4 months ago

Thanks, this has worked for me on cilium. No idea why they block 8472, but not 4789.. there were some issues open there too, but after little attention got autoclosed.

A little bit finicky can be the policies applied through cilium, sometimes they overblock when you wouldnt expect them to. But using Hubble has helped find these issues too.

One issue with cilium is, the IPv6-Route gets re-set after pod-gateway's client-init had deleted it, requiring either some modifications or, for now i have simply blocked IPv6 traffic leaving the namespace.

TheAceMan commented 4 months ago

Had a chance to dig in more and this does work, thank you! The issues I ran into were caused by a firewall in the gluetun vpn addon. Gluetun contains a firewall by default which I needed to disable or add additional iptables entries for. But I did get it working and can confirm this pod-gateway using gluetun on cilium works as well.