flannel-io / flannel

flannel is a network fabric for containers, designed for Kubernetes
Apache License 2.0
8.72k stars 2.87k forks source link

docker.io/flannel/flannel-cni-plugin:v1.5.1-flannel1 can not running #2031

Open bencyq opened 1 month ago

bencyq commented 1 month ago

Expected Behavior

k8s pod kube-flannel-ds-vjhqf is ready; docker.io/flannel/flannel-cni-plugin:v1.5.1-flannel1 is functioning properly

Current Behavior

k8s pod kube-flannel-ds-vjhqf is always at state: Init:RunContainerError pod kube-flannel-ds-vjhqf failed at

Init Containers:
  install-cni-plugin:
    Container ID:  
    Image:         docker.io/flannel/flannel-cni-plugin:v1.5.1-flannel1
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Command:
      cp
    Args:
      -f
      /flannel
      /opt/cni/bin/flannel
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       ContainerCannotRun
      Message:      **cannot join network of a non running container: 0aae74cae93531c157f438f6f3aed81ca48f5fff388e3a6b6ddabc6d69837884**
      Exit Code:    128
      Started:      Tue, 13 Aug 2024 10:53:59 +0800
      Finished:     Tue, 13 Aug 2024 10:53:59 +0800
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /opt/cni/bin from cni-plugin (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xpgnc (ro)

Possible Solution

Steps to Reproduce (for bugs)

  1. $kubectl join ...
  2. $kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml
  3. $kubectl get pods --all-namespaces
    NAMESPACE      NAME                                 READY   STATUS                   RESTARTS           AGE
    kube-flannel   kube-flannel-ds-vjhqf                0/1     Init:RunContainerError   0                  17d
    kube-flannel   kube-flannel-ds-x2xhk                1/1     Running                  0                  19d
    kube-flannel   kube-flannel-ds-zzhls                1/1     Running                  0                  19d
    kube-system    coredns-7db6d8ff4d-m4282             1/1     Running                  0                  27d
    kube-system    coredns-7db6d8ff4d-tchbs             1/1     Running                  0                  27d
    kube-system    etcd-k8s-master                      1/1     Running                  1 (27d ago)        27d
    kube-system    kube-apiserver-k8s-master            1/1     Running                  1 (27d ago)        27d
    kube-system    kube-controller-manager-k8s-master   1/1     Running                  1 (27d ago)        27d
    kube-system    kube-proxy-nb2jc                     1/1     Running                  0                  20d
    kube-system    kube-proxy-xx8vt                     0/1     CrashLoopBackOff         5170 (2m13s ago)   20d
    kube-system    kube-proxy-zc9r8                     1/1     Running                  0                  27d
    kube-system    kube-scheduler-k8s-master            1/1     Running                  1 (27d ago)        27d
  4. $kubectl describe pods kube-flannel-ds-vjhqf -n kube-flannel
    Name:                 kube-flannel-ds-vjhqf
    Namespace:            kube-flannel
    Priority:             2000001000
    Priority Class Name:  system-node-critical
    Service Account:      flannel
    Node:                 hd-ascend/10.90.1.237
    Start Time:           Fri, 26 Jul 2024 10:57:59 +0800
    Labels:               app=flannel
                      controller-revision-hash=bb4dc6cbf
                      k8s-app=flannel
                      pod-template-generation=2
                      tier=node
    Annotations:          <none>
    Status:               Pending
    IP:                   10.90.1.237
    IPs:
    IP:           10.90.1.237
    Controlled By:  DaemonSet/kube-flannel-ds
    Init Containers:
    install-cni-plugin:
    Container ID:  
    Image:         docker.io/flannel/flannel-cni-plugin:v1.5.1-flannel1
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Command:
      cp
    Args:
      -f
      /flannel
      /opt/cni/bin/flannel
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       ContainerCannotRun
      Message:      cannot join network of a non running container: 0aae74cae93531c157f438f6f3aed81ca48f5fff388e3a6b6ddabc6d69837884
      Exit Code:    128
      Started:      Tue, 13 Aug 2024 10:53:59 +0800
      Finished:     Tue, 13 Aug 2024 10:53:59 +0800
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /opt/cni/bin from cni-plugin (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xpgnc (ro)
    install-cni:
    Container ID:  
    Image:         docker.io/flannel/flannel:v0.25.5
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Command:
      cp
    Args:
      -f
      /etc/kube-flannel/cni-conf.json
      /etc/cni/net.d/10-flannel.conflist
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /etc/cni/net.d from cni (rw)
      /etc/kube-flannel/ from flannel-cfg (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xpgnc (ro)
    Containers:
    kube-flannel:
    Container ID:  
    Image:         docker.io/flannel/flannel:v0.25.5
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Command:
      /opt/bin/flanneld
    Args:
      --ip-masq
      --kube-subnet-mgr
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Requests:
      cpu:     100m
      memory:  50Mi
    Environment:
      POD_NAME:           kube-flannel-ds-vjhqf (v1:metadata.name)
      POD_NAMESPACE:      kube-flannel (v1:metadata.namespace)
      EVENT_QUEUE_DEPTH:  5000
    Mounts:
      /etc/kube-flannel/ from flannel-cfg (rw)
      /run/flannel from run (rw)
      /run/xtables.lock from xtables-lock (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xpgnc (ro)
    Conditions:
    Type                        Status
    PodReadyToStartContainers   False 
    Initialized                 False 
    Ready                       False 
    ContainersReady             False 
    PodScheduled                True 
    Volumes:
    run:
    Type:          HostPath (bare host directory volume)
    Path:          /run/flannel
    HostPathType:  
    cni-plugin:
    Type:          HostPath (bare host directory volume)
    Path:          /opt/cni/bin
    HostPathType:  
    cni:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/cni/net.d
    HostPathType:  
    flannel-cfg:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      kube-flannel-cfg
    Optional:  false
    xtables-lock:
    Type:          HostPath (bare host directory volume)
    Path:          /run/xtables.lock
    HostPathType:  FileOrCreate
    kube-api-access-xpgnc:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
    QoS Class:                   Burstable
    Node-Selectors:              <none>
    Tolerations:                 :NoSchedule op=Exists
                             node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/network-unavailable:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
    Events:
    Type     Reason          Age                        From     Message
    ----     ------          ----                       ----     -------
    Warning  BackOff         41m (x698821 over 17d)     kubelet  Back-off restarting failed container install-cni-plugin in pod kube-flannel-ds-vjhqf_kube-flannel(75d80e9d-5ae7-4287-9184-ace04731cd62)
    Normal   SandboxChanged  6m27s (x1403921 over 17d)  kubelet  Pod sandbox changed, it will be killed and re-created.
    Normal   Pulled          87s (x702562 over 17d)     kubelet  Container image "docker.io/flannel/flannel-cni-plugin:v1.5.1-flannel1" already present on machine

Context

node can never be ready in k8s cluster I'm using an arm64 machine as node to join a x86 cluster, does it matter?

Your Environment

rbrtbnfgl commented 1 month ago

So the only node that it's failing is the one with arm64? I'll check if the container for arm64 was rightly created. Could you check the logs of the failing pod with kubectl?

bencyq commented 1 month ago

So the only node that it's failing is the one with arm64? I'll check if the container for arm64 was rightly created. Could you check the logs of the failing pod with kubectl?

Thank you for your reply. Here are the logs.

$ kubectl logs kube-flannel-ds-vjhqf -n kube-flannel
Defaulted container "kube-flannel" out of: kube-flannel, install-cni-plugin (init), install-cni (init)
Error from server (BadRequest): container "kube-flannel" in pod "kube-flannel-ds-vjhqf" is waiting to start: PodInitializing

$ kubectl logs kube-proxy-xx8vt -n kube-system
failed to try resolving symlinks in path "/var/log/pods/kube-system_kube-proxy-xx8vt_7758f284-f039-4aeb-bbf5-11da20d35c8f/kube-proxy/6043.log": lstat /var/log/pods/kube-system_kube-proxy-xx8vt_7758f284-f039-4aeb-bbf5-11da20d35c8f/kube-proxy/6043.log: no such file or directory
rbrtbnfgl commented 1 month ago

Which is the output for kubectl logs kube-flannel-ds-vjhqf -n kube-flannel -c install-cni-plugin

zhangguanzhang commented 3 weeks ago
cricrl ps -a 
x3nb63 commented 2 weeks ago

I face a very similar looking problem: the Flannel DaemonSet pod fails to come up.

It fails on the install-cni-plugin init-container which gets into state CreateContainerConfigError pretty much immediatelly.

NAME↑               PF IMAGE                                                 READY  STATE                       INIT   RESTARTS PROBES(L:R) CPU/R:L MEM/R:L PORTS AGE
install-cni         ●  docker.io/flannel/flannel:v0.25.6                     true   Completed                   true          0 off:off         0:0     0:0       6h22
install-cni-plugin  ●  docker.io/flannel/flannel-cni-plugin:v1.5.1-flannel2  false  CreateContainerConfigError  true          0 off:off         0:0     0:0       6h22
kube-flannel        ●  docker.io/flannel/flannel:v0.25.6                     false  Unknown                     false         0 off:off       100:0    50:0       6h22

It does not give any output:

unable to retrieve container logs for containerd://f00f752cb6d46e4b2f866d5f6ec5ca3be330353121177d7480db396ecace6904

as I understand CreateContainerConfigError no output is to be expected as the error happens before any binary/entrypoint/... from the image gets started

kubectl describe pod kube-flannell-ds-59hwh has this error:

Warning  Failed          16m (x12 over 18m)    kubelet          Error: services have not yet been read at least once, cannot construct envvars

This comes while upgrading Kubernetes from v1.30.3 to v1.31.0 as in "it happens with all nodes I reboot into the later version";

Looking into CHANGELOG-1.31 I am lost at what could be releated.

I somehow guess it may be related to the use of Downward API for two env: variables, which get filled via fieldRef: -> fieldPath: metadata.XYZ. Thats more guessing then knowing.

x3nb63 commented 2 weeks ago

I reversed my Kubernetes version from v1.31.0 to v1.30.3 and the flannel-cni-plugin:v1.5.1-flannel2 init-container succeeds making flannel:v0.25.6 startup fine as a consequence

... so I think there is clearly a problem coming from the changes with Kubernetes v1.31.0.

thomasferrandiz commented 2 weeks ago

Hi I tested flannel with k8s 1.31 on both amd64 and arm64 and I had no issue. My test was on Ubuntu 24.04.

Can you show the kernel version that you're using and the kernel logs?

x3nb63 commented 1 week ago

the system is

$ uname -a
Linux kc04  6.6.43-flatcar #1 SMP PREEMPT_DYNAMIC Mon Aug  5 20:36:27 -00 2024 x86_64 Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz GenuineIntel GNU/Linux

$ cat /etc/os-release
NAME="Flatcar Container Linux by Kinvolk"
ID=flatcar
ID_LIKE=coreos
VERSION=3975.2.0
VERSION_ID=3975.2.0
BUILD_ID=2024-08-05-2103
SYSEXT_LEVEL=1.0
PRETTY_NAME="Flatcar Container Linux by Kinvolk 3975.2.0 (Oklo)"
ANSI_COLOR="38;5;75"
HOME_URL="https://flatcar.org/"
BUG_REPORT_URL="https://issues.flatcar.org"
FLATCAR_BOARD="amd64-usr"
CPE_NAME="cpe:2.3:o:flatcar-linux:flatcar_linux:3975.2.0:*:*:*:*:*:*:*"

to switch the K8s version I toggle the /etc/extensions/kubernetes.raw softlink between /opt/extensions/kubernetes/kubernetes-v1.30.3-x86-64.raw and /opt/extensions/kubernetes/kubernetes-v1.31.0-x86-64.raw

(this is Flatcars way of "blending in" software utilizing systemd-sysext with their sysext-bakery

For kernel logs I can't do that right now, as I would need to take down a node to get a clean one and they are all busy.

Note that the kernel version and OS version does not change here, I really only toggle the K8s binaries and reboot to have it all start properly after "blending".