Closed AdamGoodApp closed 1 week ago
Hi, can you tell the version of the Ubuntu Server you're using ?
also, just to be sure, did you follow all the instructions on the NVIDIA toolkit guide and the configuration?
and is the output from the first command from inside the container or in your computer? You can test to see if the docker is recognizing the GPU by running docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
Same issue here, Server 22.04, the nvidia-smi works both in and out of docker, the configuration has been setup, and I'm able to use the GPU in other docker containers (including automatic1111) without issue, nvidia drivers are only cuda 12.2 though, 535, and 12.3 requires 545+, so it might be a driver version issue.
Can confirm, this is a driver version issue -- because the dockerfile is based on cuda 12.3, there is a minimum requirement of nvidia 545 drivers on linux but that's never specified by any error messaging. Ubuntu 22.04 LTS seems to be recommending 535 drivers, which are only 12.2 compatible. 22.04 makes 545 available in the driver installer, so it's not a big chore to upgrade, but it would be nice if it were clear. A ubuntu-drivers install nvidia:545
and a reboot is what fixed the issue for me.
Thanks for the help @Casao I updated the readme to include a warning of the driver version and your fix
When I update driver to 555, it's happen again
@NicoNicoNico123 can you tell me what distro you're using so that i can try the new drivers ?
distro
nvidia/cuda:12.3.2-runtime-ubuntu22.04
but how to manually update driver in container ubuntu?
I pushed an update that changes the cuda version to the 12.4.1-runtime-ubuntu22.04
, however I'm not sure this will fix it. I asked the distro you're using in your computer (aka Ubuntu 22.04, arch, etc) so that i can boot into it to see if I can make it work with the 555 drivers. My distro (fedora 40) hasn't been updated to this version yet.
I pushed an update that changes the cuda version to the
12.4.1-runtime-ubuntu22.04
, however I'm not sure this will fix it. I asked the distro you're using in your computer (aka Ubuntu 22.04, arch, etc) so that i can boot into it to see if I can make it work with the 555 drivers. My distro (fedora 40) hasn't been updated to this version yet.
right now I'm reverse back to 545 drivers in windows, everything work fine :)
When running
docker compose up
getting error:RuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check
Running on a Ubuntu Server.
NVIDIA Container Toolkit is installed and I can see the GPU.
Not sure if it's a problem with my Nvidia GPU being display 1? And if so can I change the docker to use it.