NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
18.09k stars 14.14k forks source link

DNS does not work with Raspberri Pi as k3s agent #175513

Closed collinarnett closed 2 years ago

collinarnett commented 2 years ago

Describe the bug

Pods created on a raspberry pi 4 (registered as an agent node) do not resolve urls.

Steps To Reproduce

Steps to reproduce the behavior:

  1. Create x86 server k3s node k3s.nix
    networking.firewall.allowedTCPPorts = [ 6443 ];
    services.k3s = {
    enable = true;
    role = "server";
    };
  2. Create raspberry pi agent node (or other arm64 node)
    services.k3s = {
    enable = true;
    role = "agent";
    tokenFile = config.sops.secrets.k3s_agent_token.path;
    serverAddr = "https://192.168.1.164:6443";
    };
  3. kubectl run -it --rm --restart=Never busybox --image=busybox:1.28 --overrides='{"spec": { "nodeSelector": {"kubernetes.io/hostname": "eye"}}}' -- nslookup google.com <-- In this case eye is the name of the raspberry pi
Address 1: 10.43.0.10

nslookup: can't resolve 'google.com'
pod "busybox" deleted
pod default/busybox terminated (Error)

Expected behavior

DNS should work within pods.

Screenshots

If applicable, add screenshots to help explain your problem.

Additional context

$ k get nodes                                                                                                                                                    
[sudo] password for collin: 
NAME     STATUS   ROLES                  AGE   VERSION
eye      Ready    <none>                 39h   v1.23.6+k3s1
zombie   Ready    control-plane,master   39h   v1.23.6+k3s1
$ k describe node eye          
Name:               eye
Roles:              <none>
Labels:             beta.kubernetes.io/arch=arm64
                    beta.kubernetes.io/instance-type=k3s
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=arm64
                    kubernetes.io/hostname=eye
                    kubernetes.io/os=linux
                    node.kubernetes.io/instance-type=k3s
Annotations:        flannel.alpha.coreos.com/backend-data: {"VNI":1,"VtepMAC":"de:44:43:ee:86:5e"}
                    flannel.alpha.coreos.com/backend-type: vxlan
                    flannel.alpha.coreos.com/kube-subnet-manager: true
                    flannel.alpha.coreos.com/public-ip: 192.168.1.226
                    k3s.io/hostname: eye
                    k3s.io/internal-ip: 192.168.1.226
                    k3s.io/node-args: ["agent","--server","https://192.168.1.164:6443","--token-file","/run/secrets/k3s_agent_token"]
                    k3s.io/node-config-hash: Y7QC7PRE6TQ2YVSBKXANZBS76INFIQNDBB4L4LXUT6GWEU76MQ2A====
                    k3s.io/node-env: {"K3S_DATA_DIR":"/var/lib/rancher/k3s/data/cc7520e8ee9e492e4c319134621f1510d251931c243270a65828d98e2e9fe39a"}
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Sat, 28 May 2022 23:57:00 -0400
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  eye
  AcquireTime:     <unset>
  RenewTime:       Mon, 30 May 2022 15:23:20 -0400
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Mon, 30 May 2022 15:22:33 -0400   Sun, 29 May 2022 00:44:54 -0400   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Mon, 30 May 2022 15:22:33 -0400   Sun, 29 May 2022 00:44:54 -0400   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Mon, 30 May 2022 15:22:33 -0400   Sun, 29 May 2022 00:44:54 -0400   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Mon, 30 May 2022 15:22:33 -0400   Sun, 29 May 2022 00:45:05 -0400   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:  192.168.1.226
  Hostname:    eye
Capacity:
  cpu:                4
  ephemeral-storage:  30992688Ki
  memory:             3877804Ki
  pods:               110
Allocatable:
  cpu:                4
  ephemeral-storage:  30149686863
  memory:             3877804Ki
  pods:               110
System Info:
  Machine ID:                 c031a79318e946c599111cab55931e93
  System UUID:                c031a79318e946c599111cab55931e93
  Boot ID:                    9fb7819c-c100-4273-959f-f927760256b7
  Kernel Version:             5.15.32
  OS Image:                   NixOS 22.05 (Quokka)
  Operating System:           linux
  Architecture:               arm64
  Container Runtime Version:  containerd://1.5.11-k3s2
  Kubelet Version:            v1.23.6+k3s1
  Kube-Proxy Version:         v1.23.6+k3s1
PodCIDR:                      10.42.1.0/24
PodCIDRs:                     10.42.1.0/24
ProviderID:                   k3s://eye
Non-terminated Pods:          (1 in total)
  Namespace                   Name                   CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                   ------------  ----------  ---------------  -------------  ---
  kube-system                 svclb-traefik-vrps8    0 (0%)        0 (0%)      0 (0%)           0 (0%)         39h
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests  Limits
  --------           --------  ------
  cpu                0 (0%)    0 (0%)
  memory             0 (0%)    0 (0%)
  ephemeral-storage  0 (0%)    0 (0%)
Events:              <none>
$ k describe node zombie
Name:               zombie
Roles:              control-plane,master
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=k3s
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=zombie
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/control-plane=true
                    node-role.kubernetes.io/master=true
                    node.kubernetes.io/instance-type=k3s
Annotations:        flannel.alpha.coreos.com/backend-data: {"VNI":1,"VtepMAC":"32:9a:06:88:01:25"}
                    flannel.alpha.coreos.com/backend-type: vxlan
                    flannel.alpha.coreos.com/kube-subnet-manager: true
                    flannel.alpha.coreos.com/public-ip: 192.168.1.164
                    k3s.io/hostname: zombie
                    k3s.io/internal-ip: 192.168.1.164
                    k3s.io/node-args: ["server"]
                    k3s.io/node-config-hash: FJUI3FU6Q3RO32VPY7PBXHS6ENEVSNWWAPPIABEB6YHOY4CVDOBA====
                    k3s.io/node-env: {"K3S_DATA_DIR":"/var/lib/rancher/k3s/data/69cf0b0968d8bd11e910aa0aa06836920c6baa71330313240eacb9a6c71ad731"}
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Sat, 28 May 2022 23:53:38 -0400
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  zombie
  AcquireTime:     <unset>
  RenewTime:       Mon, 30 May 2022 15:25:10 -0400
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Mon, 30 May 2022 15:22:02 -0400   Sat, 28 May 2022 23:53:38 -0400   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Mon, 30 May 2022 15:22:02 -0400   Sat, 28 May 2022 23:53:38 -0400   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Mon, 30 May 2022 15:22:02 -0400   Sat, 28 May 2022 23:53:38 -0400   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Mon, 30 May 2022 15:22:02 -0400   Sat, 28 May 2022 23:53:48 -0400   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:  192.168.1.164
  Hostname:    zombie
Capacity:
  cpu:                24
  ephemeral-storage:  479081160Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             65847092Ki
  pods:               110
Allocatable:
  cpu:                24
  ephemeral-storage:  466050152083
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             65847092Ki
  pods:               110
System Info:
  Machine ID:                 440d6715398c4a968f8638b885267bf9
  System UUID:                b9c28570-b664-0000-0000-000000000000
  Boot ID:                    29e5c923-0919-4341-a484-595927f409c5
  Kernel Version:             5.15.41
  OS Image:                   NixOS 22.11 (Raccoon)
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  containerd://1.5.11-k3s2
  Kubelet Version:            v1.23.6+k3s1
  Kube-Proxy Version:         v1.23.6+k3s1
PodCIDR:                      10.42.0.0/24
PodCIDRs:                     10.42.0.0/24
ProviderID:                   k3s://zombie
Non-terminated Pods:          (18 in total)
  Namespace                   Name                                            CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                            ------------  ----------  ---------------  -------------  ---
  kube-system                 local-path-provisioner-6c79684f77-mvhlj         0 (0%)        0 (0%)      0 (0%)           0 (0%)         39h
  kube-system                 coredns-d76bd69b-ghzj4                          100m (0%)     0 (0%)      70Mi (0%)        170Mi (0%)     39h
  kube-system                 metrics-server-7cd5fcb6b7-9ch6x                 100m (0%)     0 (0%)      70Mi (0%)        0 (0%)         39h
  kube-system                 svclb-traefik-d8k9w                             0 (0%)        0 (0%)      0 (0%)           0 (0%)         39h
  kube-system                 traefik-df4ff85d6-bssq4                         0 (0%)        0 (0%)      0 (0%)           0 (0%)         39h
  edgex                       edgex-ui-6557c5b4c6-k9dps                       0 (0%)        0 (0%)      0 (0%)           0 (0%)         39h
  edgex                       edgex-redis-55f479c986-kfj8b                    0 (0%)        0 (0%)      0 (0%)           0 (0%)         39h
  edgex                       edgex-kuiper-8568c55bcf-b7psp                   0 (0%)        0 (0%)      0 (0%)           0 (0%)         39h
  edgex                       edgex-core-consul-556646c58c-n7fn6              0 (0%)        0 (0%)      0 (0%)           0 (0%)         39h
  edgex                       edgex-device-rest-78b56d8845-hxj6v              0 (0%)        0 (0%)      0 (0%)           0 (0%)         39h
  edgex                       edgex-support-scheduler-5fbc66d989-wz4pg        0 (0%)        0 (0%)      0 (0%)           0 (0%)         39h
  edgex                       edgex-core-metadata-b54b8f4fc-zr57g             0 (0%)        0 (0%)      0 (0%)           0 (0%)         39h
  edgex                       edgex-device-virtual-77bb5bc9d4-k88c6           0 (0%)        0 (0%)      0 (0%)           0 (0%)         39h
  edgex                       edgex-sys-mgmt-agent-d8d7c68d6-5g8br            0 (0%)        0 (0%)      0 (0%)           0 (0%)         39h
  edgex                       edgex-core-data-66b94ccf9d-rmkx7                0 (0%)        0 (0%)      0 (0%)           0 (0%)         39h
  edgex                       edgex-support-notifications-7cccb6d898-59c2h    0 (0%)        0 (0%)      0 (0%)           0 (0%)         39h
  edgex                       edgex-core-command-6c8945955c-h2wl9             0 (0%)        0 (0%)      0 (0%)           0 (0%)         39h
  edgex                       edgex-app-rules-engine-67d754b9f8-pd2dr         0 (0%)        0 (0%)      0 (0%)           0 (0%)         39h
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests    Limits
  --------           --------    ------
  cpu                200m (0%)   0 (0%)
  memory             140Mi (0%)  170Mi (0%)
  ephemeral-storage  0 (0%)      0 (0%)
  hugepages-1Gi      0 (0%)      0 (0%)
  hugepages-2Mi      0 (0%)      0 (0%)
Events:              <none>

Notify maintainers

@euank @Mic92 @superherointj @kalbasit

Metadata

Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

this path will be fetched (0.00 MiB download, 0.00 MiB unpacked):
  /nix/store/qypk3gzaf4p5bhzmyzjmlycs8v2sdw2h-nix-info
copying path '/nix/store/qypk3gzaf4p5bhzmyzjmlycs8v2sdw2h-nix-info' from 'https://cache.nixos.org'...
 - system: `"x86_64-linux"`
 - host os: `Linux 5.15.41, NixOS, 22.11 (Raccoon)`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.8.1`
 - channels(collin): `""`
 - channels(root): `"nixos-21.11.335130.386234e2a61"`
 - nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixos`
euank commented 2 years ago

I suspect this might be firewall related.

Can you see if:

networking.firewall.trustedInterfaces = [ "cni0" "flannel.1" ];
networking.firewall.extraCommands = ''
    iptables -A nixos-fw -p tcp --source 192.168.1.0/24 -m udp --dport 8472 -j nixos-fw-accept
'';

makes a difference, or if that doesn't, whether temporarily disabling the firewall does (assuming you can safely do that experiment)?

The above flannel vxlan port is taken from https://rancher.com/docs/k3s/latest/en/installation/installation-requirements/#networking

The k3s module doesn't do any firewall setup, even though it really should by default... but before going off into any weeds there, let's verify that is the issue, and that it's not something else.

(thanks for trying the module and reporting this issue btw!)

collinarnett commented 2 years ago

Thank you so much for getting back to me so quickly.

I tried both of the suggestions you made and neither of them worked. Running without a firewall and running the busybox test with your configuration resulted in the same output, although I'm not sure If I should be running any additional commands other than nixos-rebuild switch in between changes.

I had a look at: https://github.com/TUM-DSE/doctor-cluster-config/blob/master/modules/k3s/k3s-reset-node but I'm not sure if it's necessary to run that after any configuration change.

superherointj commented 2 years ago

image Works for me using RPi4.

  boot.kernelModules = [ "overlay" "br_netfilter" ]; # Needed for K3s?
  boot.kernel.sysctl = {
    "net.bridge-nf-call-ip6tables" = 1;
    "net.bridge-nf-call-iptables" = 1;
    "net.ipv4.ip_forward" = 1;
  };

Have you added this?

My k3s config (lots of junk): https://termbin.com/2kea

collinarnett commented 2 years ago

I figured it out after many ablations that what I was missing was opening these ports on both machines.

networking.firewall.allowedTCPPorts = [ 6443 ];
networking.firewall.allowedUDPPorts = [ 8472 ];

I didn't need any other additions to my configs.

victorbiga commented 1 year ago

suggested one liner works as @collinarnett mentioned

This is also found in k3s networking docs https://docs.k3s.io/installation/requirements#networking