flannel-io / flannel

flannel is a network fabric for containers, designed for Kubernetes
Apache License 2.0
8.6k stars 2.87k forks source link

Flannel's routing (in k3s) doesn't support local routing? #1967

Open IngwiePhoenix opened 1 month ago

IngwiePhoenix commented 1 month ago

Expected Behavior

I am running k3s, and use the default Flannel. So far, it has been working quite well but I have come across a bizarre routing "issue" that I honestly can not wrap my mind around. All I have done is set these options for flannel:

flannel-external-ip: true

Current Behavior

When I run a cURL request that should hit Traefik (and which does so) from a different system in the network, the source IP is properly and correctly identified as the remote IP. But when I do the exact same ON the node itself, the CNI interface's IP is reported as the source. Oh both hosts, router.birb.it resolves to 192.168.1.3, which is the node's IP. But only on the host with k3s/flannel does the "wrong" IP get reported as client.

Now, granted, my NAT-fu is quite bad, so I tried to google the issue and came as far as to see others having used host-gw mode. But, I plan to use another node connected via Headscale/Tailscale, which to my knowledge does not really do Layer 2, so this may not be an option. What I did find through all of this, is this:

 iptables -L -v -n -t nat | grep traefik
    2   120 KUBE-SVC-CVG3OEGEH7H5P3HQ  0    --  *      *       10.42.0.0/16         0.0.0.0/0            /* pod traffic for kube-system/traefik:websecure external destinations */
    1    60 KUBE-MARK-MASQ  0    --  *      *       0.0.0.0/0            0.0.0.0/0            /* masquerade LOCAL traffic for kube-system/traefik:websecure external destinations */ ADDRTYPE match src-type LOCAL
    1    60 KUBE-SVC-CVG3OEGEH7H5P3HQ  0    --  *      *       0.0.0.0/0            0.0.0.0/0            /* route LOCAL traffic for kube-system/traefik:websecure external destinations */ ADDRTYPE match src-type LOCAL
    0     0 KUBE-SVC-UQMCRMJZLI3FTLDP  0    --  *      *       10.42.0.0/16         0.0.0.0/0            /* pod traffic for kube-system/traefik:web external destinations */
    0     0 KUBE-MARK-MASQ  0    --  *      *       0.0.0.0/0            0.0.0.0/0            /* masquerade LOCAL traffic for kube-system/traefik:web external destinations */ ADDRTYPE match src-type LOCAL
    0     0 KUBE-SVC-UQMCRMJZLI3FTLDP  0    --  *      *       0.0.0.0/0            0.0.0.0/0            /* route LOCAL traffic for kube-system/traefik:web external destinations */ ADDRTYPE match src-type LOCAL
    0     0 KUBE-EXT-UQMCRMJZLI3FTLDP  6    --  *      *       0.0.0.0/0            0.0.0.0/0            /* kube-system/traefik:web */ tcp dpt:31228
    0     0 KUBE-EXT-CVG3OEGEH7H5P3HQ  6    --  *      *       0.0.0.0/0            0.0.0.0/0            /* kube-system/traefik:websecure */ tcp dpt:30290
    0     0 KUBE-MARK-MASQ  0    --  *      *       10.42.0.28           0.0.0.0/0            /* kube-system/traefik:websecure */
    3   180 DNAT       6    --  *      *       0.0.0.0/0            0.0.0.0/0            /* kube-system/traefik:websecure */ tcp to:10.42.0.28:8443
    0     0 KUBE-MARK-MASQ  0    --  *      *       10.42.0.28           0.0.0.0/0            /* kube-system/traefik:web */
    0     0 DNAT       6    --  *      *       0.0.0.0/0            0.0.0.0/0            /* kube-system/traefik:web */ tcp to:10.42.0.28:8000
    0     0 KUBE-SVC-UQMCRMJZLI3FTLDP  6    --  *      *       0.0.0.0/0            10.43.230.62         /* kube-system/traefik:web cluster IP */ tcp dpt:80
    0     0 KUBE-EXT-UQMCRMJZLI3FTLDP  6    --  *      *       0.0.0.0/0            192.168.1.3          /* kube-system/traefik:web loadbalancer IP */ tcp dpt:80
    0     0 KUBE-SVC-CVG3OEGEH7H5P3HQ  6    --  *      *       0.0.0.0/0            10.43.230.62         /* kube-system/traefik:websecure cluster IP */ tcp dpt:443
    0     0 KUBE-EXT-CVG3OEGEH7H5P3HQ  6    --  *      *       0.0.0.0/0            192.168.1.3          /* kube-system/traefik:websecure loadbalancer IP */ tcp dpt:443
    0     0 KUBE-MARK-MASQ  6    --  *      *      !10.42.0.0/16         10.43.230.62         /* kube-system/traefik:websecure cluster IP */ tcp dpt:443
    3   180 KUBE-SEP-ANBVRI63WCYGMRAF  0    --  *      *       0.0.0.0/0            0.0.0.0/0            /* kube-system/traefik:websecure -> 10.42.0.28:8443 */
    0     0 KUBE-MARK-MASQ  6    --  *      *      !10.42.0.0/16         10.43.230.62         /* kube-system/traefik:web cluster IP */ tcp dpt:80
    0     0 KUBE-SEP-OE364SENQTYPW4SB  0    --  *      *       0.0.0.0/0            0.0.0.0/0            /* kube-system/traefik:web -> 10.42.0.28:8000 */
    0     0 KUBE-SEP-ANBVRI63WCYGMRAF  0    --  *      *       0.0.0.0/0            0.0.0.0/0            /* kube-system/traefik:websecure -> 10.42.0.28:8443 */
    0     0 KUBE-SEP-OE364SENQTYPW4SB  0    --  *      *       0.0.0.0/0            0.0.0.0/0            /* kube-system/traefik:web -> 10.42.0.28:8000 */

What sticks out to me are the LOCAL lines, which to me indicates that local traffic is routed differently than external. But this also means that the origin IP is lost - or rather, not what one would expect to be there.

Possible Solution

Honestly, I have no idea. I am quite literally lost.

Steps to Reproduce (for bugs)

  1. Deploy k3s, use the node-external-ip and flannel-external-ip options. Also, modify the Traefik config to set externalTrafficPolicy: Local.
  2. Bring up a service like whoami. Not neccessary, but might help.
  3. On the k3s node, attempt to curl something on the external IP - observe the Traefik logs
  4. On another adjacent node - on the same network - run the same command.
  5. You should now see two different IPs reported as the client IPs in Traefik.

Example:

# External:
192.168.1.4 - - [10/May/2024:10:08:23 +0000] "HEAD / HTTP/2.0" 200 0 "-" "curl/8.7.1" 1176 "proxy-router-proxy-tr-69e06c5f6c42361dc8b1@kubernetescrd" "http://192.168.1.1:80" 41ms

# Local:
10.42.0.1 - - [10/May/2024:10:08:52 +0000] "HEAD / HTTP/2.0" 200 0 "-" "curl/7.88.1" 1183 "proxy-router-proxy-tr-2ed3520c3d71313735b6@kubernetescrd" "http://192.168.1.1:80" 4ms

(This is an external name service I use to reverse-proxy to my modem's UI - it was the smallest deployment to experiment with.)

Context

I am trying to set up Traefik middlewares to configure forwardauthentication and IP range blacklisting. Everything on my local network should always be allowed (match: Host(...) && ClientIP("192.168.1.0/24")) and everything else should require authentication through a middleware. This already kinda works; except, I can not access anything on the node itself, which is a bummer and might become a problem long-term...

Your Environment

IngwiePhoenix commented 1 month ago

Something in addition that I noticed: When I request through the VPN, I only see the ServiceLB IP; not that this is relevant here, but related at the very least.

# On remote VPS:
root@birb ~# tailscale ip -4 cluserboi
100.64.0.2
root@birb ~# curl --resolve router.birb.it:443:100.64.0.2 --head -L https://router.birb.it
HTTP/2 200
...

# On k3s node's Traefik log:
10.42.0.31 - - [10/May/2024:11:06:51 +0000] "HEAD / HTTP/2.0" 200 0 "-" "curl/7.81.0" 1882 "proxy-router-proxy-tr-2ed3520c3d71313735b6@kubernetescrd" "http://192.168.1.1:80" 4ms

# And, the IP to the pod:
# kubectl get -A pods -o=jsonpath="{range .items[*]}{.metadata.name}{'='}{.status.podIP}{','}{end}" | tr "," "\n" | grep 31
svclb-traefik-2ae61580-zlqp8=10.42.0.31

The VPN is configured with 100.64.0.0/24, so entirely different subnet.

rbrtbnfgl commented 1 month ago

Are you using tailscale to connect the nodes of the cluster? I think that's not a flannel bug but probably your setup needs a specific configuration related on what you are trying to do. Considering that the iptables rules that you shared are not part of Flannel could you check your routing table? I think that locally the node knows the pod that is hosting the service using its own IP and the routing process will use one of the host IPs that is on that subnet.