Closed johngrabner closed 2 years ago
Hi @johngrabner, most of the times this error indicates trying to run a image of an incompatible architecture. In your cate you are trying to run mysql:5.7 [1] which seems to be available on amd64. Is it possible the architecture of your hardware is arm64?
Definitely not the case that I messed up the image, for a couple of reasons: 1) I have been running this pod (MySQL) unmodified for more than a year. 2) other pods (python and node js) based were exhibiting the same errors. 3) recovered by reinstalling microk8 and I did not do a docker build, I just pushed the images to localhost:32000 4) the URLs for the 'containerd' problem have near-identical logs and this has nothing to do with MySQL. The 'containerd' problem is triggered by an overload of file io, and this is similar to my case where I transferred a few million files before this problem occurred with microk8s.
The odd thing is that I could not recover anything by 'microk8s kubectl delete -f xxx' then 'apply' the same yaml. 'Microk8s stop' followed by start did not help. Restarting the host also did not help. The only way to recover was to uninstall microk8s and reinstall it.
I am not sure how microk8s decides on what version of 'containerd' version to use, but if manual, then it may be prudent to look at the 'containerd' fixed listed above since the recovery for this problem is kind of nasty and makes it look like microk8s is at fault.
In any event, feel free to close this item if you want. I recovered my system.
Do you happen to have a microk8s inspect
tarball from when the incident happened? What is the version of MicroK8s you have?
Looking at the issue description, I'm not sure how the linked issues are related to this. Am I missing something?
In the error logs I see the following:
fork/exec /usr/local/nvidia/toolkit/nvidia-container-runtime: exec format error: unknown
/usr/local/nvidia/toolkit/nvidia-container-runtime
is the runtime used by the GPU addon. And the error seems to happen because the file is either corrupted or removed. Did you by any chance enable and then disable the gpu addon?
In any case, for future readers, the troubleshooting would be to look in /var/snap/microk8s/current/args/containerd-template.toml
and check the default runtime in use by containerd:
# default_runtime_name is the default runtime name to use.
default_runtime_name = "${RUNTIME}"
The default value should be "${RUNTIME}"
. If it is different, revert the change and then restart MicroK8s with:
sudo snap restart microk8s
Closing the issue, please reopen if this appears again.
Summary
Pod fails to start. No idea how microk8s got into this state. Tried stopping and then starting microk8s. Tried rebooting the host. Tried deleting the pod and recreating it.
https://github.com/kubernetes/kubernetes/issues/107561 appears to be similar, but in windows 10. My system is ubuntu.
The above claims this is fixed for linux by https://github.com/containerd/containerd/issues/4604#issuecomment-1027293621. Since microk8s embeds 'containerd', maybe something a microk8s can evaluate.
Here is my system's output
What Should Happen Instead?
This pod normally starts very quickly.
Reproduction Steps
Introspection Report
Can you suggest a fix?
Are you interested in contributing with a fix?