AUTOMATIC1111 / stable-diffusion-webui

Stable Diffusion web UI
GNU Affero General Public License v3.0
136.35k stars 25.99k forks source link

[Bug]: Torch is not able to use GPU #16017

Open cklogic opened 1 month ago

cklogic commented 1 month ago

Checklist

What happened?

(sd-web) [root@localhost stable-diffusion-webui]# CUDA_VISIBLE_DEVICES=0 ./webui.sh --listen --enable-insecure-extension-access --api

################################################################
Install script for stable-diffusion + Web UI
Tested on Debian 11 (Bullseye), Fedora 34+ and openSUSE Leap 15.4 or newer.
################################################################

################################################################
Running on root user
################################################################

################################################################
Repo already cloned, using it as install directory
################################################################

################################################################
Create and activate python venv
################################################################

################################################################
Launching launch.py...
################################################################
glibc version is 2.17
Check TCMalloc: libtcmalloc_minimal.so.4
libtcmalloc_minimal.so.4 is not linked with libpthread will trigger undefined symbol: pthread_Key_create error
Check TCMalloc: libtcmalloc.so.4
libtcmalloc.so.4 is linked with libpthread,execute LD_PRELOAD=/lib64/libtcmalloc.so.4
/home/xxx/anaconda3/envs/sd-web/bin/python3
Python 3.11.0 (main, Mar  1 2023, 18:26:19) [GCC 11.2.0]
Version: v1.8.0
Commit hash: bef51aed032c0aaa5cfd80445bc4cf0d85b408b5
Traceback (most recent call last):
  File "/app/stable-diffusion-webui/launch.py", line 48, in <module>
    main()
  File "/app/stable-diffusion-webui/launch.py", line 39, in main
    prepare_environment()
  File "/app/stable-diffusion-webui/modules/launch_utils.py", line 388, in prepare_environment
    raise RuntimeError(
RuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check
(sd-web) [root@localhost stable-diffusion-webui]# /home/xxx/anaconda3/envs/sd-web/bin/python3 -c 'import torch;print(torch.__version__);print(torch.cuda.is_available())'
2.2.1
True
(sd-web) [root@localhost stable-diffusion-webui]# /home/xxx/anaconda3/envs/sd-web/bin/python3 -c 'import torch;print(torch.__version__);print(torch.cuda.is_available())'
2.2.1
True
(sd-web) [root@localhost stable-diffusion-webui]# python3 -c 'import torch;print(torch.__version__);print(torch.cuda.is_available())'
2.2.1
True
(sd-web) [root@localhost stable-diffusion-webui]# python -c 'import torch;print(torch.__version__);print(torch.cuda.is_available())'
2.2.1
True
(sd-web) [root@localhost stable-diffusion-webui]#

Steps to reproduce the problem

x

What should have happened?

x

What browsers do you use to access the UI ?

Google Chrome

Sysinfo

CentOS Linux release 7.9.2009 (Core) 5.4.275-1.el7.elrepo.x86_64

Console logs

x

Additional information

No response

cklogic commented 1 month ago

Uninstalled and reinstalled torch, but still getting errors.

Installing collected packages: triton, torch, torchvision
  Attempting uninstall: triton
    Found existing installation: triton 2.2.0
    Uninstalling triton-2.2.0:
      Successfully uninstalled triton-2.2.0
  Attempting uninstall: torchvision
    Found existing installation: torchvision 0.17.1
    Uninstalling torchvision-0.17.1:
      Successfully uninstalled torchvision-0.17.1
Successfully installed torch-2.1.2+cu121 torchvision-0.16.2+cu121 triton-2.1.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Traceback (most recent call last):
  File "/app/stable-diffusion-webui/launch.py", line 48, in <module>
    main()
  File "/app/stable-diffusion-webui/launch.py", line 39, in main
    prepare_environment()
  File "/app/stable-diffusion-webui/modules/launch_utils.py", line 388, in prepare_environment
    raise RuntimeError(
RuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check
ysun commented 1 month ago

I got the same issue, with python 3.10.6.

cgstag commented 1 month ago

stable-diffusion-webui assumes that you have installed on your computer the Cuda Toolkit. Make sure you have a recent version of it, installed, then run again webui-user

ysun commented 1 month ago

@cgstag Thanks your reply. But I have installed Cuda and cuda toolkit:

nvidia-cuda-dev/noble,now 12.0.146~12.0.1-4build4 amd64 [installed]
nvidia-cuda-gdb/noble,now 12.0.140~12.0.1-4build4 amd64 [installed]
nvidia-cuda-toolkit/noble,now 12.0.140~12.0.1-4build4 amd64 [installed]
nvidia-cuda-toolkit-doc/noble,now 12.0.1-4build4 all [installed]
nvidia-cuda-toolkit-gcc/noble,now 12.0.1-4build4 amd64 [installed]
python-pycuda-doc/noble,now 2024.1~dfsg-1build2 all [installed]
python3-pycuda/noble,now 2024.1~dfsg-1build2 amd64 [installed]

I'm using Ubuntu 24.04, the output of nv-smi is as following:

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.171.04             Driver Version: 535.171.04   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3080        Off | 00000000:17:00.0 Off |                  N/A |
|  0%   37C    P8              14W / 320W |     12MiB / 10240MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce RTX 3080        Off | 00000000:65:00.0  On |                  N/A |
|  0%   49C    P8              12W / 320W |    378MiB / 10240MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      2980      G   /usr/lib/xorg/Xorg                            4MiB |
|    1   N/A  N/A      2980      G   /usr/lib/xorg/Xorg                          159MiB |
|    1   N/A  N/A      3293      G   /usr/bin/gnome-shell                         66MiB |
|    1   N/A  N/A      4596      G   ...98,262144 --variations-seed-version      102MiB |
+---------------------------------------------------------------------------------------+
sunyi@sunyi-station-ai:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Jan__6_16:45:21_PST_2023
Cuda compilation tools, release 12.0, V12.0.140
Build cuda_12.0.r12.0/compiler.32267302_0

I have no idea what's wrong here. It bothered me for a while.

The error message:

sunyi@sunyi-station-ai:~/AI/stable-diffusion-webui-bak$ ./webui.sh 

################################################################
Install script for stable-diffusion + Web UI
Tested on Debian 11 (Bullseye), Fedora 34+ and openSUSE Leap 15.4 or newer.
################################################################

################################################################
Running on sunyi user
################################################################

################################################################
Repo already cloned, using it as install directory
################################################################

################################################################
Create and activate python venv
################################################################

################################################################
Launching launch.py...
################################################################
glibc version is 2.39
Cannot locate TCMalloc. Do you have tcmalloc or google-perftool installed on your system? (improves CPU memory usage)
Python 3.10.6 (main, Jun 14 2024, 23:52:41) [GCC 13.2.0]
Version: v1.9.4
Commit hash: feee37d75f1b168768014e4634dcb156ee649c05
Traceback (most recent call last):
  File "/home/sunyi/AI/stable-diffusion-webui-bak/launch.py", line 48, in <module>
    main()
  File "/home/sunyi/AI/stable-diffusion-webui-bak/launch.py", line 39, in main
    prepare_environment()
  File "/home/sunyi/AI/stable-diffusion-webui-bak/modules/launch_utils.py", line 386, in prepare_environment
    raise RuntimeError(
RuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check
ysun commented 1 month ago

I figured it out. The "CUDA Version: 12.2 " dumped by nvidia-smi doesn't mean I have installed the CUDA, which only a reference of available CUDA version. So I installed cuda toolkits, the issue solved. download from: https://developer.nvidia.com/cuda-toolkit-archive

joshgura commented 2 weeks ago

trying to run A1111 version 1.94. (the a1111 installer is getting worse and worse with every release unfortunately) same error "Torch is not able to use GPU" even though I have webui-forge running on this same machine. So I know this error message is not specific enough. It would be helpful to know what is throwing that error and what exactly it's looking for and where.

peiwenxu commented 2 weeks ago

Please use python3.10 and also make sure to downgrade numpy==1.26.4. I think it is numpy 2.x caused issue