NVIDIA / nvidia-container-toolkit

Build and run containers leveraging NVIDIA GPUs
Apache License 2.0
1.88k stars 214 forks source link

nvidia-docker running error #176

Open yzawudi opened 7 months ago

yzawudi commented 7 months ago

1. Issue or feature description

docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: mount error: file creation failed: /var/lib/docker/overlay2/ce57966999d99e98af2113a1f5c99f40792737c48903d5ba9f6ed5365aee8275/merged/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: file exists: unknown.

2. Steps to reproduce the issue

When I run the docker image through wsl-ubuntu on window 10, I get the following error,The running command is ' docker run -d -it --privileged=true --gpus=all -e NVIDIA_DRIVER_CAPABILITIES=compute,utility -e NVIDIA_VISIBLE_DEVICES=all --name cervix-ai-server --network host image:tag /bin/bash' I don't know what went wrong, My computer is a 2060 notebook Mon Dec 4 16:17:30 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 495.53 Driver Version: 497.29 CUDA Version: 11.5 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... On | 00000000:01:00.0 On | N/A | | 43% 40C P8 15W / 170W | 667MiB / 6144MiB | N/A Default | | | | N/A | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------

elezar commented 7 months ago

It seems as if the image that you are trying to run already contains a file at the location where we are trying to mount a library from the host. This happens when one (incorrectly) uses the NVIDIA Container Runtime to build an image -- mounting the drivers from the host and leaving empty files at these locations.

To confirm this run:

docker run -ti <IMAGE> bash -c "ls -al /usr/lib/x86_64-linux-gnu/libnvidia-ml.*"

One workaround is to remove these files from the image before running it.