Error: failed to start container "alnair-vgpu-server": Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: mount error: file creation failed: /var/lib/docker/overlay2/4012b48f38e9057eb80787735e1bb47e7d86c9402d4a4976fd4b07020ae4c63b/merged/run/nvidia-persistenced/socket: no such device or address: unknown
Cause:
nvidia-container-runtime initally mount some files under /run/nvidia-persistenced
However, alnair-vgpu-server mount /run to host /run, due to using /run/alnair.sock for communication. So the /run directory's contents got rewritten.
Solutions
Change alnair socket path to /run/alnair/alnair.sock, which used in vgpu-server server.go and intercept lib client-register.
mount /run/alnair, in the alnair-vgpu-server container
Complete error message:
Error: failed to start container "alnair-vgpu-server": Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: mount error: file creation failed: /var/lib/docker/overlay2/4012b48f38e9057eb80787735e1bb47e7d86c9402d4a4976fd4b07020ae4c63b/merged/run/nvidia-persistenced/socket: no such device or address: unknown
Cause: nvidia-container-runtime initally mount some files under
/run/nvidia-persistenced
However, alnair-vgpu-server mount
/run
to host/run
, due to using/run/alnair.sock
for communication. So the /run directory's contents got rewritten.Solutions Change alnair socket path to
/run/alnair/alnair.sock
, which used in vgpu-server server.go and intercept lib client-register. mount/run/alnair
, in the alnair-vgpu-server container