NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
17.24k stars 13.51k forks source link

nixos/k3s: dns not working (reply from unexpected source) #98766

Open DavHau opened 3 years ago

DavHau commented 3 years ago

Describe the bug I set up k3s via services.k3s.enable = true. DNS was not working in the cluster.

To further debug, I followed the instructions on: https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/ With the following results:

# k3s kubectl exec -ti dnsutils -- cat /etc/resolv.conf
search default.svc.cluster.local svc.cluster.local cluster.local
nameserver 10.43.0.10
options ndots:5
# k3s kubectl exec -i -t dnsutils -- nslookup kubernetes.default
;; reply from unexpected source: 10.42.0.3#53, expected 10.43.0.10#53

;; reply from unexpected source: 10.42.0.3#53, expected 10.43.0.10#53
;; reply from unexpected source: 10.42.0.3#53, expected 10.43.0.10#53
;; connection timed out; no servers could be reached

command terminated with exit code 1

Additional context The host machine uses a static ip address configuratio with dns servers "8.8.8.8" and "8.8.4.4"

Notify maintainers

Maintainer information:

# a list of nixpkgs attributes affected by the problem
attribute:
# a list of nixos modules affected by the problem
module:
  - k3s
  - services.k3s
aditsachde commented 3 years ago

There's actually a couple of issues with networking and the k3s package.

  1. The issue reported above, which is solved by modprobe br_netfilter. This needs to be added to boot.kernelModules.

  2. If the firewall is off, ip_conntrack is not automatically loaded. k3s tries to activate it but can fail. @DavHau provided a fix in #98743. (It might also be useful to add iptables to the list of dependencies, as having the iptables utility makes networking much easier to debug, and if the firewall is disabled, it is uninstalled.)

  3. The k3s logs have messages about kube-proxy wanting to activate ip_vs, ip_vs_rr, ip_vs_wrr, and ip_vs_sh, with messages similar to Failed to load kernel module ip_vs with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules. I activated these just in case, as k3s does not run kube-proxy inside a container.

  4. Finally, when I tried to use the prebuilt k3s binary, I also had to modprobe overlay, as containerd depended on it. I think this isn't an issue here, as containerd seems to correctly activate it, but I added it to my boot.kernelModules regardless, just in case.

The k3s derivation as a whole and works nicely and is much better than naively using the prebuilt k3s binary. There are just a couple edge cases with networking that mostly boil down to kernel module issues.

/cc @euank

brhoades commented 3 years ago

I can confirm that 1 is a resolution to the issue described here. I ran into this a couple of weeks ago when building out a multinode k3s cluster.

Ultimately, I also had no networking between pods across nodes with vxlan and the host-gw flannel backends which led me to scrap my cluster. I also saw the module error in 3 and wonder if this was my original issue.

martijnjanssen commented 3 years ago

Adding br_netfilter to the boot.kernelModules does not seem to resolve it for me unfortulately, while the issue seems to be quite familiar. It seems that for some reason, there is no inter-pod communication possible, which seems quite weird to me, since I haven't done anything special in my configuration. If more information could help, please ask. I've been trying for quite a bit, but haven't found a solution anywhere, so I think I need some help. I'm on the 20.09 channel by the way.

UPDATE: I think it has something to do with the firewall rules, after disabling the firewall, it seems to work. So for people wondering why it won't work: Try setting networking.firewall.enable=false; and letting the cluster start up. If all pods in kube-system are running, the firewall can be enabled again. Still, this might be something to look at, it seems that there are some rules added to the firewall during startup by k3s which don't allow pods from connecting to each other. For documentation purposes I've added the incorrect firewall rules which blocked pods from connecting at the bottom of this comment.

I have error messages of the same format:

$ k3s kubectl exec -ti dnsutils -- cat /etc/resolv.conf
search default.svc.cluster.local svc.cluster.local cluster.local
nameserver 10.43.0.10
options ndots:5
$ k3s kubectl exec -i -t dnsutils -- nslookup kubernetes.default
;; connection timed out; no servers could be reached

command terminated with exit code 1

I think that these error messages point to an issue somewhere, but I'm not sure what it could be:

$ kubectl logs --namespace=kube-system -l k8s-app=kube-dns
E1024 10:32:57.848690       1 reflector.go:125] pkg/mod/k8s.io/client-go@v0.0.0-20190620085101-78d2af792bab/tools/cache/reflector.go:98: Failed to list *v1.Service: Get https://10.43.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.43.0.1:443: i/o timeout
E1024 10:32:57.848690       1 reflector.go:125] pkg/mod/k8s.io/client-go@v0.0.0-20190620085101-78d2af792bab/tools/cache/reflector.go:98: Failed to list *v1.Service: Get https://10.43.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.43.0.1:443: i/o timeout
E1024 10:32:57.848690       1 reflector.go:125] pkg/mod/k8s.io/client-go@v0.0.0-20190620085101-78d2af792bab/tools/cache/reflector.go:98: Failed to list *v1.Service: Get https://10.43.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.43.0.1:443: i/o timeout
I1024 10:32:57.850469       1 trace.go:82] Trace[1556590182]: "Reflector pkg/mod/k8s.io/client-go@v0.0.0-20190620085101-78d2af792bab/tools/cache/reflector.go:98 ListAndWatch" (started: 2020-10-24 10:32:27.849799857 +0000 UTC m=+69691.612882863) (total time: 30.000607943s):
Trace[1556590182]: [30.000607943s] [30.000607943s] END
E1024 10:32:57.850498       1 reflector.go:125] pkg/mod/k8s.io/client-go@v0.0.0-20190620085101-78d2af792bab/tools/cache/reflector.go:98: Failed to list *v1.Endpoints: Get https://10.43.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.43.0.1:443: i/o timeout
E1024 10:32:57.850498       1 reflector.go:125] pkg/mod/k8s.io/client-go@v0.0.0-20190620085101-78d2af792bab/tools/cache/reflector.go:98: Failed to list *v1.Endpoints: Get https://10.43.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.43.0.1:443: i/o timeout
E1024 10:32:57.850498       1 reflector.go:125] pkg/mod/k8s.io/client-go@v0.0.0-20190620085101-78d2af792bab/tools/cache/reflector.go:98: Failed to list *v1.Endpoints: Get https://10.43.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.43.0.1:443: i/o timeout
E1024 10:32:57.850498       1 reflector.go:125] pkg/mod/k8s.io/client-go@v0.0.0-20190620085101-78d2af792bab/tools/cache/reflector.go:98: Failed to list *v1.Endpoints: Get https://10.43.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.43.0.1:443: i/o timeout
[INFO] plugin/ready: Still waiting on: "kubernetes"
$ kubectl logs -n kube-system -f helm-install-traefik-t5dw2
CHART=$(sed -e "s/%{KUBERNETES_API}%/${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT}/g" <<< "${CHART}")
set +v -x
+ cp /var/run/secrets/kubernetes.io/serviceaccount/ca.crt /usr/local/share/ca-certificates/
+ update-ca-certificates
WARNING: ca-certificates.crt does not contain exactly one certificate or CRL: skipping
+ export HELM_HOST=127.0.0.1:44134
+ HELM_HOST=127.0.0.1:44134
+ helm_v2 + init --skip-refresh --client-only
tiller --listen=127.0.0.1:44134 --storage=secret
Creating /root/.helm 
Creating /root/.helm/repository 
Creating /root/.helm/repository/cache 
Creating /root/.helm/repository/local 
Creating /root/.helm/plugins 
Creating /root/.helm/starters 
Creating /root/.helm/cache/archive 
Creating /root/.helm/repository/repositories.yaml 
Adding stable repo with URL: https://kubernetes-charts.storage.googleapis.com 
Adding local repo with URL: http://127.0.0.1:8879/charts 
$HELM_HOME has been configured at /root/.helm.
Not installing Tiller due to 'client-only' flag having been set
Happy Helming!
++ ++ helm_v2 ls --all '^traefik$' jq -r '.Releases | length'
--output json
[main] 2020/10/24 10:40:01 Starting Tiller v2.12.3 (tls=false)
[main] 2020/10/24 10:40:01 GRPC listening on 127.0.0.1:44134
[main] 2020/10/24 10:40:01 Probes listening on :44135
[main] 2020/10/24 10:40:01 Storage driver is Secret
[main] 2020/10/24 10:40:01 Max history per release is 0
[storage] 2020/10/24 10:40:01 listing all releases with filter
[storage/driver] 2020/10/24 10:40:31 list: failed to list: Get https://10.43.0.1:443/api/v1/namespaces/kube-system/secrets?labelSelector=OWNER%3DTILLER: dial tcp 10.43.0.1:443: i/o timeout
Error: Get https://10.43.0.1:443/api/v1/namespaces/kube-system/secrets?labelSelector=OWNER%!D(MISSING)TILLER: dial tcp 10.43.0.1:443: i/o timeout
+ EXIST=
+ '[' '' == 1 ']'
+ '[' '' == v2 ']'
+ helm_repo_init
+ grep -q -e 'https\?://'
chart path is a url, skipping repo update
+ echo 'chart path is a url, skipping repo update'
+ helm_v3 repo remove stable
Error: no repositories configured
+ true
+ return
+ helm_update install
+ '[' helm_v3 == helm_v3 ']'
++ helm_v3 ls --all-namespaces --all -f '^traefik$' --output json
++ tr ++ jq -r '[:upper:]' '[:lower:]'
'"\(.[0].app_version),\(.[0].status)"'
+ LINE=null,null
++ echo null,null
++ cut -f1 -d,
+ INSTALLED_VERSION=null
++ echo null,null
++ cut -f2 -d,
+ STATUS=null
+ '[' -e /config/values.yaml ']'
+ VALUES='--values /config/values.yaml'
+ '[' install = delete ']'
+ '[' -z null ']'
+ '[' null = deployed ']'
+ '[' null = failed ']'
+ '[' null = deleted ']'
+ helm_v3 install traefik https://10.43.0.1:443/static/charts/traefik-1.81.0.tgz --values /config/values.yaml
Error: failed to download "https://10.43.0.1:443/static/charts/traefik-1.81.0.tgz" (hint: running `helm repo update` may help)

Incorrect firewall rules:

Chain INPUT (policy ACCEPT)
target     prot opt source               destination         
KUBE-SERVICES  all  --  anywhere             anywhere             ctstate NEW /* kubernetes service portals */
KUBE-FIREWALL  all  --  anywhere             anywhere            
KUBE-EXTERNAL-SERVICES  all  --  anywhere             anywhere             ctstate NEW /* kubernetes externally-visible service portals */
nixos-fw   all  --  anywhere             anywhere            

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         
KUBE-FORWARD  all  --  anywhere             anywhere             /* kubernetes forwarding rules */
KUBE-SERVICES  all  --  anywhere             anywhere             ctstate NEW /* kubernetes service portals */
ACCEPT     all  --  nixos-server/16      anywhere            
ACCEPT     all  --  anywhere             nixos-server/16     

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
KUBE-SERVICES  all  --  anywhere             anywhere             ctstate NEW /* kubernetes service portals */
KUBE-FIREWALL  all  --  anywhere             anywhere            

Chain KUBE-EXTERNAL-SERVICES (1 references)
target     prot opt source               destination         

Chain KUBE-FIREWALL (2 references)
target     prot opt source               destination         
DROP       all  --  anywhere             anywhere             /* kubernetes firewall for dropping marked packets */ mark match 0x8000/0x8000

Chain KUBE-FORWARD (1 references)
target     prot opt source               destination         
DROP       all  --  anywhere             anywhere             ctstate INVALID
ACCEPT     all  --  anywhere             anywhere             /* kubernetes forwarding rules */ mark match 0x4000/0x4000
ACCEPT     all  --  anywhere             anywhere             /* kubernetes forwarding conntrack pod source rule */ ctstate RELATED,ESTABLISHED
ACCEPT     all  --  anywhere             anywhere             /* kubernetes forwarding conntrack pod destination rule */ ctstate RELATED,ESTABLISHED

Chain KUBE-KUBELET-CANARY (0 references)
target     prot opt source               destination         

Chain KUBE-PROXY-CANARY (0 references)
target     prot opt source               destination         

Chain KUBE-SERVICES (3 references)
target     prot opt source               destination         
REJECT     tcp  --  anywhere             10.43.0.10           /* kube-system/kube-dns:dns-tcp has no endpoints */ tcp dpt:domain reject-with icmp-port-unreachable
REJECT     tcp  --  anywhere             10.43.0.10           /* kube-system/kube-dns:metrics has no endpoints */ tcp dpt:9153 reject-with icmp-port-unreachable
REJECT     udp  --  anywhere             10.43.0.10           /* kube-system/kube-dns:dns has no endpoints */ udp dpt:domain reject-with icmp-port-unreachable

Chain nixos-fw (1 references)
target     prot opt source               destination         
nixos-fw-accept  all  --  anywhere             anywhere            
nixos-fw-accept  all  --  anywhere             anywhere             ctstate RELATED,ESTABLISHED
nixos-fw-accept  tcp  --  anywhere             anywhere             tcp dpt:ssh
nixos-fw-accept  udp  --  anywhere             anywhere             udp dpt:ssh
nixos-fw-accept  udp  --  anywhere             anywhere             udp dpt:51820
nixos-fw-accept  icmp --  anywhere             anywhere             icmp echo-request
nixos-fw-log-refuse  all  --  anywhere             anywhere            

Chain nixos-fw-accept (6 references)
target     prot opt source               destination         
ACCEPT     all  --  anywhere             anywhere            

Chain nixos-fw-log-refuse (1 references)
target     prot opt source               destination         
LOG        tcp  --  anywhere             anywhere             tcp flags:FIN,SYN,RST,ACK/SYN LOG level info prefix "refused connection: "
nixos-fw-refuse  all  --  anywhere             anywhere             PKTTYPE != unicast
nixos-fw-refuse  all  --  anywhere             anywhere            

Chain nixos-fw-refuse (2 references)
target     prot opt source               destination         
DROP       all  --  anywhere             anywhere            
aditsachde commented 3 years ago

@martijnjanssen k3s specifies a list of ports that need to be opened, and I don't think they're defined in the k3s module. Chain nixos-fw does not include accept rules for UDP 8472, which is the port used for in-cluster communication, so that's probably why disabling the firewall rule fixes it. If doing that solves your issue, we should probably also add those to the k3s module. (Maybe we should open another more general issue to try and work out the kinks in k3s)

martijnjanssen commented 3 years ago

@aditsachde Thanks for the response! I actually don't think opening ports by default is smart (and not the fix after all), since these are actually clearly defined in the k3s readme. What fixed it for me was adding:

networking.firewall.extraCommands = ''
  iptables -I INPUT 3 -s 10.42.0.0/16 -j ACCEPT
  iptables -I INPUT 3 -d 10.42.0.0/16 -j ACCEPT
'';

I found this solution in k3s issues where users were facing the same issue https://github.com/rancher/k3s/issues/977#issuecomment-552504848. I've checked, and without changing anything else, this is the fix for inter-pod communication. To check which ip to include in the route, the ip route command can help you determine which one it is. Here the cni0 device is the k3s interface (I think, I am not very experienced with networking yet)

$ ip route
default via 192.168.1.1 dev enp1s0 proto dhcp src 192.168.1.3 metric 202 
10.42.0.0/24 dev cni0 proto kernel scope link src 10.42.0.1 
192.168.1.0/24 dev enp1s0 proto dhcp scope link src 192.168.1.3 metric 202
euank commented 3 years ago

I think the kernel-module related issues you mention above, including the overlay one, might be fixed by #101744 @aditsachde. Thanks for the detailed info on those module issues and for the ping! I'm optimistic that those kernel module issues were the reason coredns couldn't start up, and thus were thus the root cause of this issue.

The firewall stuff is unfortunately more complicated. Part of the problem is that k3s has so many knobs (i.e. the cluster cidr can be configured, you can use any of several overlay networks, etc). Another part of the problem is that k3s defaults to flannel vxlan, but opening up the vxlan port (udp 8472) is insecure unless you trust your network, so we can't exactly default to that either.

We can definitely still improve the usability of the nixos module, even if we don't have a good secure default.

aditsachde commented 3 years ago

@martijnjanssen Thank you for linking that issue. I was still having some networking issues, but they were occurring on an Ubuntu-based cluster as well so I just assumed it was because of something on my network. Adding those rules fixed it.

@euank When I get a chance sometime later, I'll test out #101744 and see if kernel modules work out of the box. I don't think we can possibly cover all firewall details, but we can probably assume that if someone is configuring the cidr or swapping out the overlay network, they are capable of configuring the firewall. I do agree that opening the firewall by default is probably a bad idea. However, an openFirewall option that defaults to false like many other modules have might be a decent compromise.

aditsachde commented 3 years ago

I've been having some different issues with networking, where all requests are resulting in a socket error or i/o timeouts, seemingly across all pods, but for sure with coredns and ingress-nginx, both on k3s on unstable, as well as #101744, so I can't say whether or not the PR solves the kernel issues.

I've managed to have tons of networking issues with K3s on Ubuntu, as well as NixOS, with only K3OS working reliability. I'd love to get to the bottom of it, but I'm really not sure where to start debugging.

jbalme commented 3 years ago

I've fingers crossed got k3s in a state where it seems to be working properly on 20.09, here's the module I'm using:

{ lib, ... }:
{
  services.k3s = {
    enable = true;
    extraFlags = "--no-deploy traefik";
  };

  # https://github.com/NixOS/nixpkgs/issues/103158
  systemd.services.k3s.after = [ "network-online.service" "firewall.service" ];
  systemd.services.k3s.serviceConfig.KillMode = lib.mkForce "control-group";

  # https://github.com/NixOS/nixpkgs/issues/98766
  boot.kernelModules = [ "br_netfilter" "ip_conntrack" "ip_vs" "ip_vs_rr" "ip_vs_wrr" "ip_vs_sh" "overlay" ];  
  networking.firewall.extraCommands = ''
    iptables -A INPUT -i cni+ -j ACCEPT
  '';
}

Of particular note, since I'm starting k3s after the firewall I can use a simple -A INPUT instead of -I INPUT 3, and I'm matching on interface prefix instead of subnet mask, so it should be less fragile.

cmrfrd commented 3 years ago

Can confirm @jbalme 's solution works for me.

stale[bot] commented 3 years ago

I marked this as stale due to inactivity. → More info

BeneSim commented 2 years ago

Similar to @jbalme 's solution you could also add cni+ to networking.firewall.trustedInterfaceswhich according to https://github.com/NixOS/nixpkgs/blob/a28adc36c20fd2fbaeb06ec9bbd79b6bf7443979/nixos/modules/services/networking/firewall.nix#L136-L139 does pretty much the same.

  networking.firewall.trustedInterfaces = [ "cni+" ];

So far it's working for me;)