Pods stuck in ContainerCreating status, Failed create pod sandbox

canonical / microk8s

MicroK8s is a small, fast, single-package Kubernetes for datacenters and the edge.

https://microk8s.io

Apache License 2.0

8.52k stars 772 forks source link

Pods stuck in ContainerCreating status, Failed create pod sandbox #48

Closed davecore82 closed 6 years ago

davecore82 commented 6 years ago

When running "microk8s.enable dns dashboard", the pods will stay in ContainerCreating status:

$ sudo snap install microk8s --beta --classic
microk8s (beta) v1.10.3 from 'canonical' installed

$ microk8s.kubectl get all 
NAME                 TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
service/kubernetes   ClusterIP   10.152.183.1   <none>        443/TCP   25s

$ microk8s.enable dns dashboard
Applying DNS manifest
service "kube-dns" created
serviceaccount "kube-dns" created
configmap "kube-dns" created
deployment.extensions "kube-dns" created
Restarting kubelet
Done
deployment.extensions "kubernetes-dashboard" created
service "kubernetes-dashboard" created
service "monitoring-grafana" created
replicationcontroller "monitoring-influxdb-grafana-v4" created
service "monitoring-influxdb" created

$ microk8s.kubectl get all --all-namespaces
NAMESPACE     NAME                                        READY     STATUS              RESTARTS   AGE
kube-system   pod/kube-dns-598d7bf7d4-f8lbm               0/3       ContainerCreating   0          9s
kube-system   pod/kubernetes-dashboard-545868474d-ltkg8   0/1       Pending             0          4s
kube-system   pod/monitoring-influxdb-grafana-v4-5qxm6    0/2       Pending             0          4s

NAMESPACE     NAME                                                   DESIRED   CURRENT   READY     AGE
kube-system   replicationcontroller/monitoring-influxdb-grafana-v4   1         1         0         4s

NAMESPACE     NAME                           TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
default       service/kubernetes             ClusterIP   10.152.183.1     <none>        443/TCP             1m
kube-system   service/kube-dns               ClusterIP   10.152.183.10    <none>        53/UDP,53/TCP       9s
kube-system   service/kubernetes-dashboard   ClusterIP   10.152.183.204   <none>        80/TCP              4s
kube-system   service/monitoring-grafana     ClusterIP   10.152.183.115   <none>        80/TCP              4s
kube-system   service/monitoring-influxdb    ClusterIP   10.152.183.228   <none>        8083/TCP,8086/TCP   4s

NAMESPACE     NAME                                   DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
kube-system   deployment.apps/kube-dns               1         1         1            0           9s
kube-system   deployment.apps/kubernetes-dashboard   1         1         1            0           4s

NAMESPACE     NAME                                              DESIRED   CURRENT   READY     AGE
kube-system   replicaset.apps/kube-dns-598d7bf7d4               1         1         0         9s
kube-system   replicaset.apps/kubernetes-dashboard-545868474d   1         1         0         4s

The pods will stay in status ContainerCreating.

$ microk8s.kubectl describe pod/kubernetes-dashboard-545868474d-ltkg8 --namespace kube-system
Name:           kubernetes-dashboard-545868474d-ltkg8
Namespace:      kube-system
Node:           <hostname>/192.168.1.17
Start Time:     Tue, 12 Jun 2018 14:33:39 -0400
Labels:         k8s-app=kubernetes-dashboard
                pod-template-hash=1014240308
Annotations:    scheduler.alpha.kubernetes.io/critical-pod=
Status:         Pending
IP:             
Controlled By:  ReplicaSet/kubernetes-dashboard-545868474d
Containers:
  kubernetes-dashboard:
    Container ID:   
    Image:          gcr.io/google_containers/kubernetes-dashboard-amd64:v1.6.0
    Image ID:       
    Port:           9090/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     100m
      memory:  50Mi
    Requests:
      cpu:        100m
      memory:     50Mi
    Liveness:     http-get http://:9090/ delay=30s timeout=30s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-vxq5n (ro)
Conditions:
  Type           Status
  Initialized    True 
  Ready          False 
  PodScheduled   True 
Volumes:
  default-token-vxq5n:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-vxq5n
    Optional:    false
QoS Class:       Guaranteed
Node-Selectors:  <none>
Tolerations:     CriticalAddonsOnly
Events:
  Type     Reason                  Age                From                     Message
  ----     ------                  ----               ----                     -------
  Normal   Scheduled               13m                default-scheduler        Successfully assigned kubernetes-dashboard-545868474d-ltkg8 to <hostname>
  Normal   SuccessfulMountVolume   13m                kubelet, <hostname>  MountVolume.SetUp succeeded for volume "default-token-vxq5n"
  Warning  FailedCreatePodSandBox  13m                kubelet, <hostname>  Failed create pod sandbox: rpc error: code = Unknown desc = NetworkPlugin kubenet failed to set up pod "kubernetes-dashboard-545868474d-ltkg8_kube-system" network: Error adding container to network: failed to Statfs "/proc/6763/ns/net": permission denied
  Normal   SandboxChanged          3m (x40 over 13m)  kubelet, <hostname>  Pod sandbox changed, it will be killed and re-created.

ktsakalozos commented 6 years ago

Hi @davecore82,

Thank you for reporting this.

Would you be able to share the output of:

journalctl -u snap.microk8s.daemon-kubelet
journalctl -u snap.microk8s.daemon-apiserver
journalctl -u snap.microk8s.daemon-docker
journalctl -u snap.microk8s.daemon-proxy

Could you also describe your setup? What distribution are you on? Anything special in your setup I should know when trying to reproduce this issue?

Can you also make sure the same issue is present on the edge channel?

sudo snap install microk8s --edge --classic

I appreciate your time, Konstantinos

davecore82 commented 6 years ago

journalctl_-u_snap.microk8s.daemon-proxy.txt

journalctl_-u_snap.microk8s.daemon-docker.txt

journalctl_-u_snap.microk8s.daemon-kubelet.txt

journalctl_-u_snap.microk8s.daemon-apiserver.zip

davecore82 commented 6 years ago

Hi Konstantinos,

I uploaded the outputs of those commands.

I'm running Ubuntu 17.10 on a laptop. This used to work last week. I removed microk8s once I was done with the testing and I tried re-installing again today and I ran into this issue. I restarted my laptop and tried re-installing again but I had the same issue.

This is the output of "snap list":

$ snap list 
Name              Version                 Rev   Tracking  Developer         Notes
conjure-up        2.5.7-20180606.1806     1006  stable    canonical         classic
core              16-2.32.8               4650  stable    canonical         core
docker            17.06.2-ce              179   stable    docker-inc        -
google-cloud-sdk  204.0.0                 38    stable    google-cloud-sdk  classic
juju              2.3.8                   4423  stable    canonical         classic
kubectl           1.10.3                  405   stable    canonical         classic
kubefed           1.9.0-alpha3            379   stable    canonical         classic
lxd               3.1                     7412  stable    canonical         -
microk8s          v1.10.3                 55    beta      canonical         classic
skype             8.22.0.2                33    stable    skype             classic
slack             3.2.1                   7     stable    slack             classic
spotify           1.0.80.474.gef6b503e-7  16    stable    spotify           -
telegram-desktop  1.3.0                   213   stable    3v1n0             -

I'll try with microk8s in edge and report back.

Thanks! David

davecore82 commented 6 years ago

Trying to remove microk8s does this:

$ snap remove microk8s
error: cannot perform the following tasks:
- Stop snap "microk8s" services ([start snap.microk8s.daemon-scheduler.service snap.microk8s.daemon-etcd.service snap.microk8s.daemon-proxy.service snap.microk8s.daemon-apiserver.service snap.microk8s.daemon-controller-manager.service snap.microk8s.daemon-docker.service snap.microk8s.daemon-kubelet.service] failed with exit status 5: Failed to start snap.microk8s.daemon-scheduler.service: Unit snap.microk8s.daemon-scheduler.service not found.
Failed to start snap.microk8s.daemon-etcd.service: Unit snap.microk8s.daemon-etcd.service not found.
Failed to start snap.microk8s.daemon-proxy.service: Unit snap.microk8s.daemon-proxy.service not found.
Failed to start snap.microk8s.daemon-apiserver.service: Unit snap.microk8s.daemon-apiserver.service not found.
Failed to start snap.microk8s.daemon-controller-manager.service: Unit snap.microk8s.daemon-controller-manager.service not found.
Failed to start snap.microk8s.daemon-docker.service: Unit snap.microk8s.daemon-docker.service not found.
Failed to start snap.microk8s.daemon-kubelet.service: Unit snap.microk8s.daemon-kubelet.service not found.
)
- Remove data for snap "microk8s" (55) (remove /var/snap/microk8s/common/var/run/docker/netns/9a1ac483706b: device or resource busy)

davecore82 commented 6 years ago

I rebooted my laptop and then I could remove microk8s:

$ snap remove microk8s
microk8s removed

I installed microk8s from edge:

$ sudo snap install microk8s --edge --classic
microk8s (edge) v1.10.4 from 'canonical' installed

But I still have the same problem:

$ microk8s.enable dns dashboard
Enabling DNS
Applying manifest
service "kube-dns" created
serviceaccount "kube-dns" created
configmap "kube-dns" created
deployment.extensions "kube-dns" created
Restarting kubelet
DNS is enabled
Enabling dashboard
secret "kubernetes-dashboard-certs" created
serviceaccount "kubernetes-dashboard" created
deployment.apps "kubernetes-dashboard" created
service "kubernetes-dashboard" created
service "monitoring-grafana" created
service "monitoring-influxdb" created
service "heapster" created
deployment.extensions "monitoring-influxdb-grafana-v4" created
serviceaccount "heapster" created
configmap "heapster-config" created
configmap "eventer-config" created
deployment.extensions "heapster-v1.5.2" created
dashboard enabled

$ microk8s.kubectl get pods --all-namespaces 
NAMESPACE     NAME                                             READY     STATUS              RESTARTS   AGE
kube-system   heapster-v1.5.2-556597699d-vv4h4                 0/4       ContainerCreating   0          13s
kube-system   kube-dns-598d7bf7d4-qfc5x                        0/3       ContainerCreating   0          19s
kube-system   kubernetes-dashboard-7d5dcdb6d9-mlx6t            0/1       ContainerCreating   0          14s
kube-system   monitoring-influxdb-grafana-v4-6d67c7f4f-57rph   0/2       ContainerCreating   0          13s

$ microk8s.kubectl describe pod kubernetes-dashboard-7d5dcdb6d9-mlx6t --namespace kube-system
Name:           kubernetes-dashboard-7d5dcdb6d9-mlx6t
Namespace:      kube-system
Node:           <hostname>/192.168.1.5
Start Time:     Tue, 12 Jun 2018 22:24:09 -0400
Labels:         k8s-app=kubernetes-dashboard
                pod-template-hash=3818786285
Annotations:    <none>
Status:         Pending
IP:             
Controlled By:  ReplicaSet/kubernetes-dashboard-7d5dcdb6d9
Containers:
  kubernetes-dashboard:
    Container ID:  
    Image:         k8s.gcr.io/kubernetes-dashboard-amd64:v1.8.3
    Image ID:      
    Port:          8443/TCP
    Host Port:     0/TCP
    Args:
      --auto-generate-certificates
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Liveness:       http-get https://:8443/ delay=30s timeout=30s period=10s #success=1 #failure=3
    Environment:    <none>
    Mounts:
      /certs from kubernetes-dashboard-certs (rw)
      /tmp from tmp-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kubernetes-dashboard-token-pt66b (ro)
Conditions:
  Type           Status
  Initialized    True 
  Ready          False 
  PodScheduled   True 
Volumes:
  kubernetes-dashboard-certs:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  kubernetes-dashboard-certs
    Optional:    false
  tmp-volume:
    Type:    EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:  
  kubernetes-dashboard-token-pt66b:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  kubernetes-dashboard-token-pt66b
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node-role.kubernetes.io/master:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason                  Age               From                     Message
  ----     ------                  ----              ----                     -------
  Normal   Scheduled               39s               default-scheduler        Successfully assigned kubernetes-dashboard-7d5dcdb6d9-mlx6t to <hostname>
  Normal   SuccessfulMountVolume   35s               kubelet, <hostname>  MountVolume.SetUp succeeded for volume "tmp-volume"
  Normal   SuccessfulMountVolume   35s               kubelet, <hostname>  MountVolume.SetUp succeeded for volume "kubernetes-dashboard-token-pt66b"
  Normal   SuccessfulMountVolume   34s               kubelet, <hostname>  MountVolume.SetUp succeeded for volume "kubernetes-dashboard-certs"
  Warning  FailedCreatePodSandBox  32s               kubelet, <hostname>  Failed create pod sandbox: rpc error: code = Unknown desc = NetworkPlugin kubenet failed to set up pod "kubernetes-dashboard-7d5dcdb6d9-mlx6t_kube-system" network: Error adding container to network: failed to Statfs "/proc/6174/ns/net": permission denied
  Normal   SandboxChanged          1s (x3 over 31s)  kubelet, <hostname>  Pod sandbox changed, it will be killed and re-created.

davecore82 commented 6 years ago

It looks like microk8s uses the kubenet plugin? Am I supposed to have a cbr0 bridge on my laptop?

From https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins/

Kubenet creates a Linux bridge named cbr0 and creates a veth pair for each pod with the host end of each pair connected to cbr0.

I don't think I have that:

$ ip -o -4 a s
1: lo    inet 127.0.0.1/8 scope host lo\       valid_lft forever preferred_lft forever
2: enp1s0f0    inet 192.168.1.5/24 brd 192.168.1.255 scope global dynamic enp1s0f0\       valid_lft 86006sec preferred_lft 86006sec
4: virbr1    inet 192.168.100.1/24 brd 192.168.100.255 scope global virbr1\       valid_lft forever preferred_lft forever
6: virbr0    inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0\       valid_lft forever preferred_lft forever
8: lxdbr0    inet 10.137.13.1/24 scope global lxdbr0\       valid_lft forever preferred_lft forever
9: docker0    inet 172.17.0.1/16 scope global docker0\       valid_lft forever preferred_lft forever

ktsakalozos commented 6 years ago

The issue with microk8s failing to remove cleanly is known. We have raised the issue with the snap team https://forum.snapcraft.io/t/cleanup-before-services-go-down/5802 to get some help in addressing it.

For now you need to make sure you do not have any pods running when you remove microk8s. We have introduced the microk8s.reset command to help in the cleanup that needs to happen. Call microk8s.reset right before snap remove microk8s.

davecore82 commented 6 years ago

Hi Konstantinos,

I tried running microk8s.reset but my pods stayed in the Terminating state. I ended up rebooting my laptop and removing the microk8s snap.

Let me know if you need more testing from my side.

Thanks, David

ktsakalozos commented 6 years ago

Hi David,

I see the kube-proxy shows some errors while it starts so you should be right on cbr0 not being created.

Got a clean VM with 17.10 on amazon but I am unable to reproduce this. Is there anything that happened during this week that you have been using microk8s? Anything you installed? Any update that came in? Anything special that had to do with the networking of your system? IP tables related? I will keep on hammering this for a while more but I do not see what would caused this problem.

Some things I would like you to double check. Make sure you start with a clean system. When you remove microk8s doublecheck you do not have /var/snap/microk8s and /snap/microk8s. Make sure /proc/mounts have nothing microk8s related cat /proc/mounts | grep microk8s.

Please use the latest edge build, do a sudo install microk8s --edge --classic and then snap refresh microk8s --edge just in case. Then make sure the following services are running:

sudo systemctl status snap.microk8s.daemon-proxy
sudo systemctl status snap.microk8s.daemon-kubelet
sudo systemctl status snap.microk8s.daemon-apiserver
sudo systemctl status snap.microk8s.daemon-etcd
sudo systemctl status snap.microk8s.daemon-docker
sudo systemctl status snap.microk8s.daemon-controller-manager
sudo systemctl status snap.microk8s.daemon-scheduler

For any of the not started services do you see anything in the logs? (journalctl -u snap.microk8s.daemon-proxy)

Anything else interesting/failing in /var/log/syslog?

Thanks

davecore82 commented 6 years ago

Hi Konstantinos,

This morning I tried again on my laptop and this time it looks like everything is OK.

$ sudo snap install microk8s --edge --classic
microk8s (edge) v1.10.4 from 'canonical' installed

$ microk8s.enable dns dashboard
Enabling DNS
Applying manifest
service "kube-dns" created
serviceaccount "kube-dns" created
configmap "kube-dns" created
deployment.extensions "kube-dns" created
Restarting kubelet
DNS is enabled
Enabling dashboard
secret "kubernetes-dashboard-certs" created
serviceaccount "kubernetes-dashboard" created
deployment.apps "kubernetes-dashboard" created
service "kubernetes-dashboard" created
service "monitoring-grafana" created
service "monitoring-influxdb" created
service "heapster" created
deployment.extensions "monitoring-influxdb-grafana-v4" created
serviceaccount "heapster" created
configmap "heapster-config" created
configmap "eventer-config" created
deployment.extensions "heapster-v1.5.2" created
dashboard enabled

$ microk8s.kubectl get pods --all-namespaces
NAMESPACE     NAME                                             READY     STATUS    RESTARTS   AGE
kube-system   heapster-v1.5.2-8698858dc6-mg2s7                 4/4       Running   0          23s
kube-system   kube-dns-598d7bf7d4-4psgp                        3/3       Running   0          1m
kube-system   kubernetes-dashboard-7d5dcdb6d9-z84dh            1/1       Running   0          1m
kube-system   monitoring-influxdb-grafana-v4-6d67c7f4f-vjcxr   2/2       Running   0          1m

I know that yesterday I removed my docker snap after removing the microk8s snap. I then installed Docker from the Docker APT repositories. I wonder if the Docker snap could be conflicting with the docker in microk8s? I'll try to reproduce the issue again.

davecore82 commented 6 years ago

FYI, I tried uninstalling microk8s but the process got stuck because of:

error: cannot perform the following tasks:
- Remove data for snap "microk8s" (93) (remove /var/snap/microk8s/common/var/lib/docker/aufs: device or resource busy)

I did a "sudo umount /var/snap/microk8s/common/var/lib/docker/aufs" and then I could remove the microk8s snap.

I reinstalled it again and everything works fine.

davecore82 commented 6 years ago

I tried uninstalling again but this time doing a microk8s.reset before and it uninstalled cleanly.

davecore82 commented 6 years ago

I tried to reproduce with the docker snap installed but I couldn't. Everything is still fine.

So I guess we can close this bug. I'm unable to reproduce this issue that I had.

Thanks for your help!

ktsakalozos commented 6 years ago

We did some fixes on the cleanup/remove path (https://github.com/juju-solutions/microk8s/blob/master/snap/hooks/remove#L7) of the snap that may have affected the issue you were having.

I am not very happy that we cannot reproduce it, but there is not much we can do.

Please re-open the issue if it reappears. Thank you.

zaneclaes commented 5 years ago

@ktsakalozos this just started happening to me today. The only thing I've done to my computer or deployment in the past month is update my Ubuntu 18.04 packages and reboot the machine, just before this started. None of my services will start; all pods are stuck with this status. Note that I'm not using the beta or edge channels. I've run microk8s.inspect and each of the above status commands; everything is running and looks fine.

zaneclaes commented 5 years ago

After reinstalling microk8s, a different error started appearing upon first attempting to apply the deployment.yml:

The Deployment "home-assistant" is invalid: spec.template.spec.containers[0].securityContext.privileged: Forbidden: disallowed by cluster policy

I don't understand why this deployment would suddenly be identified as invalid; as mentioned above, I made no changes to the file, and microk8s was already up to date (I ensured that I had run snap refresh). I tried removing the offending syntax from the deployment, but that subsequently triggered an error about a taint. So I inspected the system pods, and found that the dns pod itself is only 2/3 ready with 66 restarts. Same goes for hostpath-provisioner. I made sure that I had done the appropriate sudo iptables -P FORWARD ACCEPT and disabled/re-enabled dns. Still the same problem. When I describe the pods, they're exhibiting the exact same symptoms (rpc errors followed by Pod sandbox changed, it will be killed and re-created.).

shahzaibekram commented 3 years ago

I got the same problem how I solve this problem?

NAMESPACE     NAME                                         READY   STATUS              RESTARTS   AGE
kube-system   metrics-server-8bbfb4bdb-q9ldh               0/1     ContainerCreating   0          68m
kube-system   kubernetes-dashboard-7ffd448895-qm44w        0/1     ContainerCreating   0          68m
kube-system   dashboard-metrics-scraper-6c4568dc68-qrcjv   0/1     ContainerCreating   0          68m
kube-system   coredns-86f78bb79c-v7tgt                     0/1     ContainerCreating   0          70m
kube-system   calico-kube-controllers-847c8c99d-g59cp      0/1     ContainerCreating   0          72m
kube-system   calico-node-xph9f                            0/1     CrashLoopBackOff    17         72m