canonical / microk8s

MicroK8s is a small, fast, single-package Kubernetes for datacenters and the edge.
https://microk8s.io
Apache License 2.0
8.53k stars 773 forks source link

infinit ContainerCreating, Failed to create pod sandbox: rpc error: code = Unknown desc #3257

Closed johngrabner closed 2 years ago

johngrabner commented 2 years ago

Summary

Pod fails to start. No idea how microk8s got into this state. Tried stopping and then starting microk8s. Tried rebooting the host. Tried deleting the pod and recreating it.

https://github.com/kubernetes/kubernetes/issues/107561 appears to be similar, but in windows 10. My system is ubuntu.

The above claims this is fixed for linux by https://github.com/containerd/containerd/issues/4604#issuecomment-1027293621. Since microk8s embeds 'containerd', maybe something a microk8s can evaluate.

Here is my system's output

k get pods
NAME                              READY   STATUS              RESTARTS   AGE
mysql-deployment-bbd94b8f-rqtp6   0/1     ContainerCreating   0          14m

k describe pod mysql-deployment-bbd94b8f-rqtp6
Name:           mysql-deployment-bbd94b8f-rqtp6
Namespace:      default
Priority:       0
Node:           john-trx40-designare/xxxxxxxxx
Start Time:     Fri, 17 Jun 2022 19:38:15 -0500
Labels:         app=mysql-pod
                pod-template-hash=bbd94b8f
Annotations:    cni.projectcalico.org/containerID: c78baf2d1ee221513d06ef52f54b8a96a648faa518a86e8769ee5a96d4ce15ca
                cni.projectcalico.org/podIP: 
                cni.projectcalico.org/podIPs: 
Status:         Pending
IP:             
IPs:            <none>
Controlled By:  ReplicaSet/mysql-deployment-bbd94b8f
Containers:
  mysql-container:
    Container ID:  
    Image:         mysql:5.7
    Image ID:      
    Port:          3306/TCP
    Host Port:     0/TCP
    Args:
      --sql-mode=STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:
    Mounts:
      /var/lib/mysql from mysql-persistent-storage (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-mrgjm (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  mysql-persistent-storage:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  mysql-pv-claim
    ReadOnly:   false
  kube-api-access-mrgjm:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age                    From     Message
  ----     ------                  ----                   ----     -------
  Warning  FailedCreatePodSandBox  7m15s (x143 over 11m)  kubelet  (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim: OCI runtime create failed: unable to retrieve OCI runtime error (open /var/snap/microk8s/common/run/containerd/io.containerd.runtime.v2.task/k8s.io/bdc3a2f0e141cc5996b41b7ff85d9b0ab8cb4a0599050879454505582c1b05ce/log.json: no such file or directory): fork/exec /usr/local/nvidia/toolkit/nvidia-container-runtime: exec format error: unknown
  Warning  FailedCreatePodSandBox  105s                   kubelet  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim: OCI runtime create failed: unable to retrieve OCI runtime error (open /var/snap/microk8s/common/run/containerd/io.containerd.runtime.v2.task/k8s.io/e35458b706261955295f4c6ebd88cfe5b153c6913ccee5b15f42a3933a023cc4/log.json: no such file or directory): fork/exec /usr/local/nvidia/toolkit/nvidia-container-runtime: exec format error: unknown
  Warning  FailedCreatePodSandBox  103s                   kubelet  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim: OCI runtime create failed: unable to retrieve OCI runtime error (open /var/snap/microk8s/common/run/containerd/io.containerd.runtime.v2.task/k8s.io/d2ac65b7afc2eb2fadd781706966ddd3e2e14f23a81a5a8f23d382c22dcd148f/log.json: no such file or directory): fork/exec /usr/local/nvidia/toolkit/nvidia-container-runtime: exec format error: unknown
  Warning  FailedCreatePodSandBox  99s                    kubelet  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim: OCI runtime create failed: unable to retrieve OCI runtime error (open /var/snap/microk8s/common/run/containerd/io.containerd.runtime.v2.task/k8s.io/0101ae5fd0bc5b23171f3d0a62f63b3db0b7eda22e10cbbe6cbd4e3544f644cd/log.json: no such file or directory): fork/exec /usr/local/nvidia/toolkit/nvidia-container-runtime: exec format error: unknown
  Warning  FailedCreatePodSandBox  96s                    kubelet  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim: OCI runtime create failed: unable to retrieve OCI runtime error (open /var/snap/microk8s/common/run/containerd/io.containerd.runtime.v2.task/k8s.io/b929aae7a8393625e89d55e32fec76c5b739452c2501630f8f0ebcee068e9dcd/log.json: no such file or directory): fork/exec /usr/local/nvidia/toolkit/nvidia-container-runtime: exec format error: unknown
  Warning  FailedCreatePodSandBox  92s                    kubelet  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim: OCI runtime create failed: unable to retrieve OCI runtime error (open /var/snap/microk8s/common/run/containerd/io.containerd.runtime.v2.task/k8s.io/3223d52f90e25e5d7baa153bfcaa7baffe8733b379851a40803372ae23308aff/log.json: no such file or directory): fork/exec /usr/local/nvidia/toolkit/nvidia-container-runtime: exec format error: unknown
  Warning  FailedCreatePodSandBox  87s                    kubelet  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim: OCI runtime create failed: unable to retrieve OCI runtime error (open /var/snap/microk8s/common/run/containerd/io.containerd.runtime.v2.task/k8s.io/90cb2aca61ac58bbc6a2494340be300adbeb873f9bc29b89ecb04844e0c43982/log.json: no such file or directory): fork/exec /usr/local/nvidia/toolkit/nvidia-container-runtime: exec format error: unknown
  Warning  FailedCreatePodSandBox  86s                    kubelet  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim: OCI runtime create failed: unable to retrieve OCI runtime error (open /var/snap/microk8s/common/run/containerd/io.containerd.runtime.v2.task/k8s.io/32e7fa2683299a6c97ffe544c68bd46c17c0ff43c901e57a722526cd551fabe4/log.json: no such file or directory): fork/exec /usr/local/nvidia/toolkit/nvidia-container-runtime: exec format error: unknown
  Warning  FailedCreatePodSandBox  84s                    kubelet  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim: OCI runtime create failed: unable to retrieve OCI runtime error (open /var/snap/microk8s/common/run/containerd/io.containerd.runtime.v2.task/k8s.io/163134ddbb413ca7d19d7d3051714a7f78b056fe2d6c61532fe177bff03ace2d/log.json: no such file or directory): fork/exec /usr/local/nvidia/toolkit/nvidia-container-runtime: exec format error: unknown
  Warning  FailedCreatePodSandBox  82s                    kubelet  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim: OCI runtime create failed: unable to retrieve OCI runtime error (open /var/snap/microk8s/common/run/containerd/io.containerd.runtime.v2.task/k8s.io/045efa1e100a501a83d7c269ca83d02ef676943654248dc05cfc2dac983ed1aa/log.json: no such file or directory): fork/exec /usr/local/nvidia/toolkit/nvidia-container-runtime: exec format error: unknown
  Warning  FailedCreatePodSandBox  50s (x16 over 79s)     kubelet  (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim: OCI runtime create failed: unable to retrieve OCI runtime error (open /var/snap/microk8s/common/run/containerd/io.containerd.runtime.v2.task/k8s.io/0faa61dc52ae7267627ff5e637987ff5d7231ffb12e4793bffd1def01770b84b/log.json: no such file or directory): fork/exec /usr/local/nvidia/toolkit/nvidia-container-runtime: exec format error: unknown

What Should Happen Instead?

This pod normally starts very quickly.

Reproduction Steps

  1. microk8s is in this funky state. No idea how to create this bad state or how to get out of it.
  2. ...

Introspection Report

Can you suggest a fix?

Are you interested in contributing with a fix?

ktsakalozos commented 2 years ago

Hi @johngrabner, most of the times this error indicates trying to run a image of an incompatible architecture. In your cate you are trying to run mysql:5.7 [1] which seems to be available on amd64. Is it possible the architecture of your hardware is arm64?

[1] https://hub.docker.com/layers/mysql/library/mysql/5.7/images/sha256-06c614dfc9720ccc0177acf700d0e0794f0efe3a032e78ea5318c30886ce62c1?context=explore

johngrabner commented 2 years ago

Definitely not the case that I messed up the image, for a couple of reasons: 1) I have been running this pod (MySQL) unmodified for more than a year. 2) other pods (python and node js) based were exhibiting the same errors. 3) recovered by reinstalling microk8 and I did not do a docker build, I just pushed the images to localhost:32000 4) the URLs for the 'containerd' problem have near-identical logs and this has nothing to do with MySQL. The 'containerd' problem is triggered by an overload of file io, and this is similar to my case where I transferred a few million files before this problem occurred with microk8s.

The odd thing is that I could not recover anything by 'microk8s kubectl delete -f xxx' then 'apply' the same yaml. 'Microk8s stop' followed by start did not help. Restarting the host also did not help. The only way to recover was to uninstall microk8s and reinstall it.

I am not sure how microk8s decides on what version of 'containerd' version to use, but if manual, then it may be prudent to look at the 'containerd' fixed listed above since the recovery for this problem is kind of nasty and makes it look like microk8s is at fault.

In any event, feel free to close this item if you want. I recovered my system.

ktsakalozos commented 2 years ago

Do you happen to have a microk8s inspect tarball from when the incident happened? What is the version of MicroK8s you have?

neoaggelos commented 2 years ago

Looking at the issue description, I'm not sure how the linked issues are related to this. Am I missing something?

In the error logs I see the following:

fork/exec /usr/local/nvidia/toolkit/nvidia-container-runtime: exec format error: unknown

/usr/local/nvidia/toolkit/nvidia-container-runtime is the runtime used by the GPU addon. And the error seems to happen because the file is either corrupted or removed. Did you by any chance enable and then disable the gpu addon?

In any case, for future readers, the troubleshooting would be to look in /var/snap/microk8s/current/args/containerd-template.toml and check the default runtime in use by containerd:

    # default_runtime_name is the default runtime name to use.
    default_runtime_name = "${RUNTIME}"

The default value should be "${RUNTIME}". If it is different, revert the change and then restart MicroK8s with:

sudo snap restart microk8s

Closing the issue, please reopen if this appears again.