HenriquesLab / DL4MicEverywhere

Bringing the ZeroCostDL4Mic experience running everywhere via easy-to-install docker images
Creative Commons Attribution 4.0 International
68 stars 10 forks source link

Impossible to use GPU in docker desktop #37

Closed mvelean1 closed 6 months ago

mvelean1 commented 7 months ago

Hello, After successfully using zerocostdl4mic to train a model using stardist3D, I now would like to train locally using Dl4miceverywhere. I use Ubuntu 22.04, installed Cuda (the last version, 12.4) and Docker. The Stardist3D notebook work perfectly when I don't use the GPU, but when I use it I get the error [could not select device driver “” with capabilities: [[gpu]] ... My Cuda installation is normally working. Is there anything I missed ? Thank you in advance !

IvanHCenalmor commented 7 months ago

Hi @mvelean1 , The message that you are getting might be because Nvidia is not properly registered with Docker. If your CUDA installation is normally working, the possible suggest you to restart the Docker daemon:

sudo systemctl restart docker

In case this is still not working, it might be that the nvidia-cuda-toolkit is not correctly installed in your computer. Could you check what is the output of the following command?

nvcc -V

Please let me know if this restarting Docker daemon worked or if you might need to install nvidia-cuda-toolkit.

Thank you very much for using DL4MicEverywhere! 😄

mvelean1 commented 7 months ago

Thank you for your answer and your help !

It didn't work after restarting Docker. Here's the output of nvcc -V :

nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2024 NVIDIA Corporation Built on Tue_Feb_27_16:19:38_PST_2024 Cuda compilation tools, release 12.4, V12.4.99 Build cuda_12.4.r12.4/compiler.33961263_0

IvanHCenalmor commented 7 months ago

Okey so the cuda toolkit is installed, can you check to run a simple Docker image from Nvidia to test if that works correctly?

docker run --gpus all --rm nvidia/cuda:12.3.2-devel-ubuntu22.04 nvidia-smi

It should give you the same output as if you run nvidia-smi in your terminal.

Does this work?

mvelean1 commented 7 months ago

I get the same error ... `docker run --gpus all --rm nvidia/cuda:12.3.2-devel-ubuntu22.04 nvidia-smi

Unable to find image 'nvidia/cuda:12.3.2-devel-ubuntu22.04' locally 12.3.2-devel-ubuntu22.04: Pulling from nvidia/cuda

Digest: sha256:6655d5fc2fb48580255a5021a81c379c325a457b74b77ac823ed67e4faa32aeb

Status: Downloaded newer image for nvidia/cuda:12.3.2-devel-ubuntu22.04 docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].`

IvanHCenalmor commented 7 months ago

Okey, then the issue seems to be not with the nvidia-cuda-toolkitbut with the nvidia-container-toolkit that is needed to make the connection between nvidia and the docker containers. For that you will need to install nvidia-container-toolkit and restart Docker daemon again just in case. Try with the following commands:

sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker

In many blogs they comment that this fixed their issues, so I hope that this will fix the problem. If not we can find another way to fix it 😄

mvelean1 commented 7 months ago

I already tried that, sounded like it would solve my problem, but it didn't :( I get the same error unfortunately !

mvelean1 commented 6 months ago

I think it's a problem of permission, when I run docker in rootless mode, I can run a docker image from Nvidia. However, I didn't succeed to run DL4MicEverywhere in rootless mode

IvanHCenalmor commented 6 months ago

Okey so if you run the Nvidia Docker image but with sudo it works?

sudo docker run --gpus all --rm nvidia/cuda:12.3.2-devel-ubuntu22.04 nvidia-smi
mvelean1 commented 6 months ago

Unfortunately, no :(

'CUDA Version 12.3.2

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License. By pulling and using the container, you accept the terms and conditions of this license: https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

Failed to initialize NVML: Unknown Error '

mvelean1 commented 6 months ago

Hello, I solved the issue, working good now ! I think the issue came from the fact that nvidia-container-toolkit was not properly registered in the docker config file. This toolkit is indeed required to use the GPU with docker on linux.

I installed it as you said using sudo apt-get update sudo apt-get install -y nvidia-container-toolkit sudo systemctl restart docker but the official doc says to also run sudo nvidia-ctk runtime configure --runtime=docker to register it in the config file of docker daemon in /etc/docker/daemon.json.

However it doesn't work for Docker Desktop, that doesn't use this file, so I added the path of this file using dockerd --config-file /etc/docker/daemon.json .

I'm not sure it's very clean, but it works.

Thank you for your support !

IvanHCenalmor commented 6 months ago

Hi @mvelean1, That is wonderful a new!!

I see, sorry that I missed that part of registering the nvidia toolkit into the docker configuration. I'm super glad that you found the solution because I was not finding other things. Yeah I think that is clear, also for other people that might have the same error in the future.

Thanks a lot for all the feedback and putting all the steps and output that you have gone through!! 😄