flannel-io / flannel

flannel is a network fabric for containers, designed for Kubernetes
Apache License 2.0
8.76k stars 2.87k forks source link

Flannel in arm64 CrashLoopBackOff after 60 secs #1130

Closed chenchix closed 1 year ago

chenchix commented 5 years ago

After run flannel in master (amd64), it dies in worker which is arm64

Expected Behavior

Flannel pod should work fine in arm64

Current Behavior

After 60 secs: kube-system kube-flannel-ds-amd64-f7h9n 1/1 Running 0 17h kube-system kube-flannel-ds-arm64-2f4sz 0/1 CrashLoopBackOff 5 6m40s

Possible Solution

Steps to Reproduce (for bugs)

kubeadm init --pod-network-cidr=10.244.0.0/16 cp stuffs... kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml And the join command in worker

After a while (about 60 secs), pods in arm64 dies with CrashLoopBackOff state and logs show these lines:

0425 08:03:09.414654       1 main.go:514] Determining IP address of default interface
I0425 08:03:09.415980       1 main.go:527] Using interface with name eth2 and address 192.168.24.252
I0425 08:03:09.416036       1 main.go:544] Defaulting external address to interface address (192.168.24.252)
E0425 08:03:39.420548       1 main.go:241] Failed to create SubnetManager: error retrieving pod spec for 'kube-system/kube-flannel-ds-arm64-2f4sz': Get https://10.96.0.1:443/api/v1/namespaces/kube-system/pods/kube-flannel-ds-arm64-2f4sz: dial tcp 10.96.0.1:443: i/o timeout

If you open the link, this is the json.

{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {

  },
  "status": "Failure",
  "message": "pods \"kube-flannel-ds-arm64-2f4sz\" is forbidden: User \"system:anonymous\" cannot get resource \"pods\" in API group \"\" in the namespace \"kube-system\"",
  "reason": "Forbidden",
  "details": {
    "name": "kube-flannel-ds-arm64-2f4sz",
    "kind": "pods"
  },
  "code": 403
}

Context

Your Environment

zkpingguo commented 5 years ago

I got the same problem when using flannel v0.11.0,debian8 with embedded arm platform。 flannel awlays restart because of “ dial tcp 10.96.0.1:443: i/o timeout” ,anyone can help on this?

zkpingguo commented 5 years ago

bug fixed,pod need route to 10.96.0.1,I use "route add -net 10.96.0.0/16 gw xxx dev xxx“ on node to fix this problem.

chenchix commented 5 years ago

In my case, I need to create /run/flannel/subnet.env in POD before the join command.

ChrisM-liu commented 5 years ago

bug fixed,pod need route to 10.96.0.1,I use "route add -net 10.96.0.0/16 gw xxx dev xxx“ on node to fix this problem.

what is gw xxx dev xxx ?

100cm commented 5 years ago

what is gw xxx dev xxx ?

seanrclayton commented 5 years ago

gateway is the gateway assigned to your flannel subnet (which you should have set when you did a kubadm init) and dev is the device to create this on, eg eth0 though these instructions are unclear. Since unless you in the same subnet as your kubernetes system this wont work. The fix for me was kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/k8s-manifests/kube-flannel-rbac.yml

ukreddy-erwin commented 4 years ago

sudo kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/k8s-manifests/kube-flannel-rbac.yml But it is failing the network

dony71 commented 3 years ago

I have same problem Images used : k8s.gcr.io/kube-proxy v1.19.3 k8s.gcr.io/kube-controller-manager v1.19.3 k8s.gcr.io/kube-apiserver v1.19.3 k8s.gcr.io/kube-scheduler v1.19.3 k8s.gcr.io/etcd 3.4.13-0 k8s.gcr.io/coredns 1.7.0 k8s.gcr.io/pause 3.2 quay.io/coreos/flannel v0.13.0

kubectl describe -n kube-system pod kube-flannel-ds-4gksk

Name:                 kube-flannel-ds-4gksk
Namespace:            kube-system
Priority:             2000001000
Priority Class Name:  system-node-critical
Node:                 cloudstack-mgmt/172.31.36.107
Start Time:           Fri, 23 Oct 2020 18:11:51 -0700
Labels:               app=flannel
                      controller-revision-hash=56df9fd6f9
                      pod-template-generation=1
                      tier=node
Annotations:          <none>
Status:               Pending
IP:                   172.31.36.107
IPs:
  IP:           172.31.36.107
Controlled By:  DaemonSet/kube-flannel-ds
Init Containers:
  install-cni:
    Container ID:  docker://e0e11e1528f9cfa06885ebc6af0591e6bd39d752d666feb633dc08d8d4928cc2
    Image:         quay.io/coreos/flannel:v0.13.0
    Image ID:      docker-pullable://quay.io/coreos/flannel@sha256:ac5322604bcab484955e6dbc507f45a906bde79046667322e3918a8578ab08c8
    Port:          <none>
    Host Port:     <none>
    Command:
      cp
    Args:
      -f
      /etc/kube-flannel/cni-conf.json
      /etc/cni/net.d/10-flannel.conflist
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Fri, 23 Oct 2020 18:12:15 -0700
      Finished:     Fri, 23 Oct 2020 18:12:15 -0700
    Ready:          False
    Restart Count:  2
    Environment:    <none>
    Mounts:
      /etc/cni/net.d from cni (rw)
      /etc/kube-flannel/ from flannel-cfg (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from flannel-token-l6mpn (ro)
Containers:
  kube-flannel:
    Container ID:
    Image:         quay.io/coreos/flannel:v0.13.0
    Image ID:
    Port:          <none>
    Host Port:     <none>
    Command:
      /opt/bin/flanneld
    Args:
      --ip-masq
      --kube-subnet-mgr
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     100m
      memory:  50Mi
    Requests:
      cpu:     100m
      memory:  50Mi
    Environment:
      POD_NAME:       kube-flannel-ds-4gksk (v1:metadata.name)
      POD_NAMESPACE:  kube-system (v1:metadata.namespace)
    Mounts:
      /etc/kube-flannel/ from flannel-cfg (rw)
      /run/flannel from run (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from flannel-token-l6mpn (ro)
Conditions:
  Type              Status
  Initialized       False
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  run:
    Type:          HostPath (bare host directory volume)
    Path:          /run/flannel
    HostPathType:
  cni:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/cni/net.d
    HostPathType:
  flannel-cfg:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      kube-flannel-cfg
    Optional:  false
  flannel-token-l6mpn:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  flannel-token-l6mpn
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     :NoScheduleop=Exists
                 node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                 node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                 node.kubernetes.io/network-unavailable:NoSchedule op=Exists
                 node.kubernetes.io/not-ready:NoExecute op=Exists
                 node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                 node.kubernetes.io/unreachable:NoExecute op=Exists
                 node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  63s                default-scheduler  Successfully assigned kube-system/kube-flannel-ds-4gksk to cloudstack-mgmt
  Normal   Created    40s (x3 over 60s)  kubelet            Created container install-cni
  Normal   Started    39s (x3 over 59s)  kubelet            Started container install-cni
  Warning  BackOff    26s (x4 over 54s)  kubelet            Back-off restarting failed container
  Normal   Pulled     13s (x4 over 60s)  kubelet            Container image "quay.io/coreos/flannel:v0.13.0" already present on machine
sktrinh12 commented 2 years ago

i got same issue:

I0326 07:47:47.666267       1 main.go:518] Determining IP address of default interface
I0326 07:47:47.759994       1 main.go:531] Using interface with name eth0 and address 192.168.1.15
I0326 07:47:47.760128       1 main.go:548] Defaulting external address to interface address (192.168.1.15)
W0326 07:47:47.760224       1 client_config.go:517] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
E0326 07:47:47.987387       1 main.go:243] Failed to create SubnetManager: error retrieving pod spec for 'kube-system/kube-flannel-ds-arm64-9hd9d': pods "kube-flannel-ds-arm64-9hd9d" is forbidden: User "system:serviceaccount:kube-system:flannel" cannot get resource "pods" in API group "" in the namespace "kube-system"

Any luck with a solution?

noptanakhon commented 2 years ago

kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/k8s-manifests/kube-flannel-rbac.yml

this works for me. I need to delete old flannel pod after run the command

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.