Closed mvelean1 closed 6 months ago
Hi @mvelean1 , The message that you are getting might be because Nvidia is not properly registered with Docker. If your CUDA installation is normally working, the possible suggest you to restart the Docker daemon:
sudo systemctl restart docker
In case this is still not working, it might be that the nvidia-cuda-toolkit is not correctly installed in your computer. Could you check what is the output of the following command?
nvcc -V
Please let me know if this restarting Docker daemon worked or if you might need to install nvidia-cuda-toolkit.
Thank you very much for using DL4MicEverywhere! 😄
Thank you for your answer and your help !
It didn't work after restarting Docker. Here's the output of nvcc -V :
nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2024 NVIDIA Corporation Built on Tue_Feb_27_16:19:38_PST_2024 Cuda compilation tools, release 12.4, V12.4.99 Build cuda_12.4.r12.4/compiler.33961263_0
Okey so the cuda toolkit is installed, can you check to run a simple Docker image from Nvidia to test if that works correctly?
docker run --gpus all --rm nvidia/cuda:12.3.2-devel-ubuntu22.04 nvidia-smi
It should give you the same output as if you run nvidia-smi
in your terminal.
Does this work?
I get the same error ... `docker run --gpus all --rm nvidia/cuda:12.3.2-devel-ubuntu22.04 nvidia-smi
Unable to find image 'nvidia/cuda:12.3.2-devel-ubuntu22.04' locally 12.3.2-devel-ubuntu22.04: Pulling from nvidia/cuda
Digest: sha256:6655d5fc2fb48580255a5021a81c379c325a457b74b77ac823ed67e4faa32aeb
Status: Downloaded newer image for nvidia/cuda:12.3.2-devel-ubuntu22.04 docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].`
Okey, then the issue seems to be not with the nvidia-cuda-toolkit
but with the nvidia-container-toolkit
that is needed to make the connection between nvidia and the docker containers. For that you will need to install nvidia-container-toolkit
and restart Docker daemon again just in case. Try with the following commands:
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
In many blogs they comment that this fixed their issues, so I hope that this will fix the problem. If not we can find another way to fix it 😄
I already tried that, sounded like it would solve my problem, but it didn't :( I get the same error unfortunately !
I think it's a problem of permission, when I run docker in rootless mode, I can run a docker image from Nvidia. However, I didn't succeed to run DL4MicEverywhere in rootless mode
Okey so if you run the Nvidia Docker image but with sudo it works?
sudo docker run --gpus all --rm nvidia/cuda:12.3.2-devel-ubuntu22.04 nvidia-smi
Unfortunately, no :(
'CUDA Version 12.3.2
Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License. By pulling and using the container, you accept the terms and conditions of this license: https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
Failed to initialize NVML: Unknown Error '
Hello,
I solved the issue, working good now !
I think the issue came from the fact that nvidia-container-toolkit
was not properly registered in the docker config file.
This toolkit is indeed required to use the GPU with docker on linux.
I installed it as you said using sudo apt-get update sudo apt-get install -y nvidia-container-toolkit sudo systemctl restart docker
but the official doc says to also run sudo nvidia-ctk runtime configure --runtime=docker
to register it in the config file of docker daemon in /etc/docker/daemon.json
.
However it doesn't work for Docker Desktop, that doesn't use this file, so I added the path of this file using dockerd --config-file /etc/docker/daemon.json
.
I'm not sure it's very clean, but it works.
Thank you for your support !
Hi @mvelean1, That is wonderful a new!!
I see, sorry that I missed that part of registering the nvidia toolkit into the docker configuration. I'm super glad that you found the solution because I was not finding other things. Yeah I think that is clear, also for other people that might have the same error in the future.
Thanks a lot for all the feedback and putting all the steps and output that you have gone through!! 😄
Hello, After successfully using zerocostdl4mic to train a model using stardist3D, I now would like to train locally using Dl4miceverywhere. I use Ubuntu 22.04, installed Cuda (the last version, 12.4) and Docker. The Stardist3D notebook work perfectly when I don't use the GPU, but when I use it I get the error [could not select device driver “” with capabilities: [[gpu]] ... My Cuda installation is normally working. Is there anything I missed ? Thank you in advance !