Yummiii / sd-webui-forge-docker

A docker image for Stable Diffusion WebUI Forge
16 stars 9 forks source link

Torch is not able to use GPU #1

Closed AdamGoodApp closed 1 week ago

AdamGoodApp commented 6 months ago

When running docker compose up getting error:

RuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check

Running on a Ubuntu Server.

NVIDIA Container Toolkit is installed and I can see the GPU.

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.07             Driver Version: 535.161.07   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3090        Off | 00000000:00:10.0 Off |                  N/A |
|  0%   34C    P8               6W / 370W |      1MiB / 24576MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

Not sure if it's a problem with my Nvidia GPU being display 1? And if so can I change the docker to use it.

*-display:0
       description: VGA compatible controller
       product: bochs-drmdrmfb
       physical id: 2
       bus info: pci@0000:00:02.0
       logical name: /dev/fb0
       version: 02
       width: 32 bits
       clock: 33MHz
       capabilities: vga_controller bus_master rom fb
       configuration: depth=32 driver=bochs-drm latency=0 resolution=1280,800
       resources: irq:0 memory:f2000000-f2ffffff memory:fe6d4000-fe6d4fff memory:c0000-dffff
  *-display:1
       description: VGA compatible controller
       product: GA102 [GeForce RTX 3090]
       vendor: NVIDIA Corporation
       physical id: 10
       bus info: pci@0000:00:10.0
       version: a1
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi pciexpress vga_controller bus_master cap_list rom
       configuration: driver=nvidia latency=0
       resources: irq:46 memory:fd000000-fdffffff memory:e0000000-efffffff memory:f0000000-f1ffffff ioport:f000(size=128) memory:c0000-dffff
Yummiii commented 5 months ago

Hi, can you tell the version of the Ubuntu Server you're using ?

also, just to be sure, did you follow all the instructions on the NVIDIA toolkit guide and the configuration?

and is the output from the first command from inside the container or in your computer? You can test to see if the docker is recognizing the GPU by running docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

Casao commented 5 months ago

Same issue here, Server 22.04, the nvidia-smi works both in and out of docker, the configuration has been setup, and I'm able to use the GPU in other docker containers (including automatic1111) without issue, nvidia drivers are only cuda 12.2 though, 535, and 12.3 requires 545+, so it might be a driver version issue.

Casao commented 5 months ago

Can confirm, this is a driver version issue -- because the dockerfile is based on cuda 12.3, there is a minimum requirement of nvidia 545 drivers on linux but that's never specified by any error messaging. Ubuntu 22.04 LTS seems to be recommending 535 drivers, which are only 12.2 compatible. 22.04 makes 545 available in the driver installer, so it's not a big chore to upgrade, but it would be nice if it were clear. A ubuntu-drivers install nvidia:545 and a reboot is what fixed the issue for me.

Yummiii commented 5 months ago

Thanks for the help @Casao I updated the readme to include a warning of the driver version and your fix

NicoNicoNico123 commented 4 months ago

When I update driver to 555, it's happen again

Yummiii commented 4 months ago

@NicoNicoNico123 can you tell me what distro you're using so that i can try the new drivers ?

NicoNicoNico123 commented 4 months ago

distro

nvidia/cuda:12.3.2-runtime-ubuntu22.04

but how to manually update driver in container ubuntu?

Yummiii commented 4 months ago

I pushed an update that changes the cuda version to the 12.4.1-runtime-ubuntu22.04, however I'm not sure this will fix it. I asked the distro you're using in your computer (aka Ubuntu 22.04, arch, etc) so that i can boot into it to see if I can make it work with the 555 drivers. My distro (fedora 40) hasn't been updated to this version yet.

NicoNicoNico123 commented 4 months ago

I pushed an update that changes the cuda version to the 12.4.1-runtime-ubuntu22.04, however I'm not sure this will fix it. I asked the distro you're using in your computer (aka Ubuntu 22.04, arch, etc) so that i can boot into it to see if I can make it work with the 555 drivers. My distro (fedora 40) hasn't been updated to this version yet.

right now I'm reverse back to 545 drivers in windows, everything work fine :)