I have created a multi-node k0s Kubernetes cluster using this blog https://www.padok.fr/en/blog/k0s-kubernetes-gpu
I'm getting the same error Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox runtime: no runtime for "nvidia" is configured .
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
time="2023-01-10T11:51:53Z" level=info msg="Successfully loaded config"
time="2023-01-10T11:51:53Z" level=info msg="Config version: 2"
time="2023-01-10T11:51:53Z" level=info msg="Updating config"
time="2023-01-10T11:51:53Z" level=info msg="Successfully updated config"
time="2023-01-10T11:51:53Z" level=info msg="Flushing config"
time="2023-01-10T11:51:53Z" level=info msg="Successfully flushed config"
time="2023-01-10T11:51:53Z" level=info msg="Sending SIGHUP signal to containerd"
time="2023-01-10T11:51:53Z" level=info msg="Successfully signaled containerd"
time="2023-01-10T11:51:53Z" level=info msg="Completed 'setup' for containerd"
time="2023-01-10T11:51:53Z" level=info msg="Waiting for signal"
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1601 G /usr/lib/xorg/Xorg 4MiB |
| 1 N/A N/A 1601 G /usr/lib/xorg/Xorg 9MiB |
| 1 N/A N/A 1736 G /usr/bin/gnome-shell 8MiB |
+-----------------------------------------------------------------------------+
1. Issue or feature description
I have created a multi-node k0s Kubernetes cluster using this blog https://www.padok.fr/en/blog/k0s-kubernetes-gpu I'm getting the same error
Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox runtime: no runtime for "nvidia" is configured
.2. Steps to reproduce the issue
I have followed this blog https://www.padok.fr/en/blog/k0s-kubernetes-gpu
Download k0s binary
Download k0sctl binary
Then you need to create a k0sctl.yaml config file: For a multi-node Kubernetes cluster
k0sctl.yaml file
/tmp/k0s/containerd.toml file
Then run the command: k0sctl apply --config /path/to/k0sctl.yaml
Deploy NVIDIA GPU Operator
values.yaml file
Install Helm
Now, add the NVIDIA Helm repository:
1. Are drivers/container-toolkit pre-installed on the host or installed by the GPU operator?
2. OS version
Ubuntu 20.04.5 LTS
3. Status of all pods under gpu-operator namespace
4. Logs from init-containers
from device-plugin
Error from server (BadRequest): container "toolkit-validation" in pod "nvidia-device-plugin-daemonset-tbbgb" is waiting to start: PodInitializing
from container-toolkit
+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ time="2023-01-10T11:51:53Z" level=info msg="Successfully loaded config" time="2023-01-10T11:51:53Z" level=info msg="Config version: 2" time="2023-01-10T11:51:53Z" level=info msg="Updating config" time="2023-01-10T11:51:53Z" level=info msg="Successfully updated config" time="2023-01-10T11:51:53Z" level=info msg="Flushing config" time="2023-01-10T11:51:53Z" level=info msg="Successfully flushed config" time="2023-01-10T11:51:53Z" level=info msg="Sending SIGHUP signal to containerd" time="2023-01-10T11:51:53Z" level=info msg="Successfully signaled containerd" time="2023-01-10T11:51:53Z" level=info msg="Completed 'setup' for containerd" time="2023-01-10T11:51:53Z" level=info msg="Waiting for signal"
+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 1601 G /usr/lib/xorg/Xorg 4MiB | | 1 N/A N/A 1601 G /usr/lib/xorg/Xorg 9MiB | | 1 N/A N/A 1736 G /usr/bin/gnome-shell 8MiB | +-----------------------------------------------------------------------------+