AUTOMATIC1111 / stable-diffusion-webui

Stable Diffusion web UI
GNU Affero General Public License v3.0
143.48k stars 27.01k forks source link

[Bug]: AMD GPU not Recognized by Stable-Diffusion-WebUI #14382

Open jaminW55 opened 11 months ago

jaminW55 commented 11 months ago

Checklist

What happened?

Whenever running a prompt, the program does not use my AMD GPU or rOCM. i have checked all drivers and components are up to date, and regardless additional -args nothing fixes this issue.

Steps to reproduce the problem

  1. Install Program on AMD GPU Computer
  2. Run Program (enter any associated arguments)
  3. Run a prompt

What should have happened?

This should be using my AMD GPU to generate images, but it is not. I have monitored with rocm-smi, and verified this is the case. On my original install, AMD GPU was utilized just fine, maybe 2-3 weeks ago.

What browsers do you use to access the UI ?

Firefox

Sysinfo

sysinfo-2023-12-20-19-53.txt

Console logs

I have additional args in webui-user.sh:

export COMMANDLINE_ARGS="--skip-torch-cuda-test --precision full --no-half"
ᐉ stable-diffusion-web-ui-server

################################################################
Install script for stable-diffusion + Web UI
Tested on Debian 11 (Bullseye)
################################################################

################################################################
Running on jaminW55 user
################################################################

################################################################
Repo already cloned, using it as install directory
################################################################

################################################################
Create and activate python venv
################################################################

################################################################
Launching launch.py...
################################################################
Using TCMalloc: libtcmalloc_minimal.so.4
Python 3.11.6 (main, Nov 14 2023, 09:36:21) [GCC 13.2.1 20230801]
Version: v1.6.1
Commit hash: 4afaaf8a020c1df457bcf7250cb1c7f609699fa7
Launching Web UI with arguments: --skip-torch-cuda-test --precision full --no-half
/opt/stable-diffusion-web-ui/venv/lib/python3.11/site-packages/torch/cuda/__init__.py:611: UserWarning: Can't initialize NVML
  warnings.warn("Can't initialize NVML")
WARNING:xformers:WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
    PyTorch 2.0.1+cu118 with CUDA 1108 (you have 2.1.1+cu121)
    Python  3.11.3 (you have 3.11.6)
  Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
  Memory-efficient attention, SwiGLU, sparse and more won't be available.
  Set XFORMERS_MORE_DETAILS=1 for more details
No module 'xformers'. Proceeding without it.
Warning: caught exception 'No CUDA GPUs are available', memory monitor disabled
Loading weights [676f0d60c8] from /opt/stable-diffusion-web-ui/models/Stable-diffusion/dreamshaperXL_turboDpmppSDE.safetensors
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 3.9s (import torch: 1.1s, import gradio: 0.3s, setup paths: 1.4s, other imports: 0.3s, load scripts: 0.1s, create ui: 0.2s, gradio launch: 0.3s).
Creating model from config: /opt/stable-diffusion-web-ui/repositories/generative-models/configs/inference/sd_xl_base.yaml
Applying attention optimization: InvokeAI... done.
Model loaded in 2.5s (load weights from disk: 0.6s, create model: 0.2s, apply weights to model: 0.8s, apply float(): 0.6s, calculate empty prompt: 0.2s).

Additional information

All drivers and rOCM are up to date.

soaska commented 11 months ago

On my original install, AMD GPU was utilized just fine, maybe 2-3 weeks ago too. I have python-pytorch-opt-rocm. Package python-torchvision-rocm doesnt compile, having this error:

cc1plus: предупреждение: command-line option «-Wno-duplicate-decl-specifier» is valid for C/ObjC but not for C++
In file included from /home/sosiska/.cache/paru/clone/python-torchvision-rocm/src/vision-0.16.2/torchvision/csrc/vision.cpp:1:
/home/sosiska/.cache/paru/clone/python-torchvision-rocm/src/vision-0.16.2/torchvision/csrc/vision.h:10:40: предупреждение: «extern» декларация «_register_ops» с инициализацией
   10 | extern "C" VISION_INLINE_VARIABLE auto _register_ops = &cuda_version;
      |                                        ^~~~~~~~~~~~~
[100%] Linking CXX shared library libtorchvision.so
[100%] Built target torchvision
Traceback (most recent call last):
  File "/home/sosiska/.cache/paru/clone/python-torchvision-rocm/src/vision-0.16.2/setup.py", line 9, in <module>
    import torch
  File "/usr/lib/python3.11/site-packages/torch/__init__.py", line 234, in <module>
    _load_global_deps()
  File "/usr/lib/python3.11/site-packages/torch/__init__.py", line 193, in _load_global_deps
    raise err
  File "/usr/lib/python3.11/site-packages/torch/__init__.py", line 174, in _load_global_deps
    ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
  File "/usr/lib/python3.11/ctypes/__init__.py", line 376, in __init__
    self._handle = _dlopen(self._name, mode)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: libmkl_intel_lp64.so.2: cannot open shared object file: No such file or directory
==> ОШИБКА: Произошел сбой в build().
    Прерывание...
ошибка: не удалось собрать «python-torchvision-rocm-0.16.2-1»:
ошибка: не удалось собрать пакеты: python-torchvision-rocm-0.16.2-1

When im trying to import torch:

python -c "import torch"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python3.11/site-packages/torch/__init__.py", line 234, in <module>
    _load_global_deps()
  File "/usr/lib/python3.11/site-packages/torch/__init__.py", line 193, in _load_global_deps
    raise err
  File "/usr/lib/python3.11/site-packages/torch/__init__.py", line 174, in _load_global_deps
    ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
  File "/usr/lib/python3.11/ctypes/__init__.py", line 376, in __init__
    self._handle = _dlopen(self._name, mode)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: libmkl_intel_lp64.so.2: cannot open shared object file: No such file or directory

My system info:

export | grep -E "(GPU_TARGETS|AMDGPU_TARGETS|PYTORCH_ROCM_ARCH|HSA_OVERRIDE_GFX_VERSION)"
  Name:                    AMD Ryzen 5 3600X 6-Core Processor
  Marketing Name:          AMD Ryzen 5 3600X 6-Core Processor
  Vendor Name:             CPU
  Chip ID:                 0(0x0)
  BDFID:                   0
  Internal Node ID:        0
  Name:                    gfx1030
  Marketing Name:          AMD Radeon RX 6900 XT
  Vendor Name:             AMD
  Chip ID:                 29631(0x73bf)
  BDFID:                   2560
  Internal Node ID:        1
      Name:                    amdgcn-amd-amdhsa--gfx1030
soaska commented 11 months ago

If u have same problem, congrats! Ur torch is broken. If u have amd gpu u should use special rocm version of PyTorch and torch vision. If u have it installed in ur system, use venv with system site packages. Hope it will work.

Maybe it can work different on windows cuz this os is weird, I don't know how it works

jaminW55 commented 11 months ago

If u have same problem, congrats! Ur torch is broken. If u have amd gpu u should use special rocm version of PyTorch and torch vision. If u have it installed in ur system, use venv with system site packages. Hope it will work.

Maybe it can work different on windows cuz this os is weird, I don't know how it works

Hello. So you are doing local install and running a local venv environment, not using the AUR? What did you set as your launch parameters to ensure the correcr roCM/pytorch is used?

zakusworo commented 11 months ago

Checklist

  • [x] The issue exists after disabling all extensions
  • [x] The issue exists on a clean installation of webui
  • [x] The issue is caused by an extension, but I believe it is caused by a bug in the webui
  • [x] The issue exists in the current version of the webui
  • [x] The issue has not been reported before recently
  • [x] The issue has been reported before but has not been fixed yet

What happened?

Whenever running a prompt, the program does not use my AMD GPU or rOCM. i have checked all drivers and components are up to date, and regardless additional -args nothing fixes this issue.

Steps to reproduce the problem

  1. Install Program on AMD GPU Computer
  2. Run Program (enter any associated arguments)
  3. Run a prompt

What should have happened?

This should be using my AMD GPU to generate images, but it is not. I have monitored with rocm-smi, and verified this is the case. On my original install, AMD GPU was utilized just fine, maybe 2-3 weeks ago.

What browsers do you use to access the UI ?

Firefox

Sysinfo

sysinfo-2023-12-20-19-53.txt

Console logs

I have additional args in webui-user.sh:

export COMMANDLINE_ARGS="--skip-torch-cuda-test --precision full --no-half"
ᐉ stable-diffusion-web-ui-server

################################################################
Install script for stable-diffusion + Web UI
Tested on Debian 11 (Bullseye)
################################################################

################################################################
Running on jaminW55 user
################################################################

################################################################
Repo already cloned, using it as install directory
################################################################

################################################################
Create and activate python venv
################################################################

################################################################
Launching launch.py...
################################################################
Using TCMalloc: libtcmalloc_minimal.so.4
Python 3.11.6 (main, Nov 14 2023, 09:36:21) [GCC 13.2.1 20230801]
Version: v1.6.1
Commit hash: 4afaaf8a020c1df457bcf7250cb1c7f609699fa7
Launching Web UI with arguments: --skip-torch-cuda-test --precision full --no-half
/opt/stable-diffusion-web-ui/venv/lib/python3.11/site-packages/torch/cuda/__init__.py:611: UserWarning: Can't initialize NVML
  warnings.warn("Can't initialize NVML")
WARNING:xformers:WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
    PyTorch 2.0.1+cu118 with CUDA 1108 (you have 2.1.1+cu121)
    Python  3.11.3 (you have 3.11.6)
  Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
  Memory-efficient attention, SwiGLU, sparse and more won't be available.
  Set XFORMERS_MORE_DETAILS=1 for more details
No module 'xformers'. Proceeding without it.
Warning: caught exception 'No CUDA GPUs are available', memory monitor disabled
Loading weights [676f0d60c8] from /opt/stable-diffusion-web-ui/models/Stable-diffusion/dreamshaperXL_turboDpmppSDE.safetensors
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 3.9s (import torch: 1.1s, import gradio: 0.3s, setup paths: 1.4s, other imports: 0.3s, load scripts: 0.1s, create ui: 0.2s, gradio launch: 0.3s).
Creating model from config: /opt/stable-diffusion-web-ui/repositories/generative-models/configs/inference/sd_xl_base.yaml
Applying attention optimization: InvokeAI... done.
Model loaded in 2.5s (load weights from disk: 0.6s, create model: 0.2s, apply weights to model: 0.8s, apply float(): 0.6s, calculate empty prompt: 0.2s).

Additional information

All drivers and rOCM are up to date.

Seems you got wrong torch version. Just do this on terminal at SD webui folder:

python3 -m venv venv source venv/bin/activate pip uninstall torch torchvision torchaudio pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm5.7

create a file below with text editor :

!/bin/sh

source venv/bin/activate

export HSA_OVERRIDE_GFX_VERSION=10.3.0
export HIP_VISIBLE_DEVICES=0 export PYTORCH_HIP_ALLOC_CONF=garbage_collection_threshold:0.8,max_split_size_mb:512

python3 launch.py --enable-insecure-extension-access --opt-sdp-attention

save that text as launch.sh you need to edit " HSA_OVERRIDE_GFX_VERSION=10.3.0 " to your exact AMD GFX version, in my case since i'm using RX 6800 (RDNA2), it's 10.3.0

after that, open terminal in your webui folder, just do: bash launch.sh

that's all

soaska commented 11 months ago

If u have same problem, congrats! Ur torch is broken. If u have amd gpu u should use special rocm version of PyTorch and torch vision. If u have it installed in ur system, use venv with system site packages. Hope it will work.

Maybe it can work different on windows cuz this os is weird, I don't know how it works

Hello. So you are doing local install and running a local venv environment, not using the AUR? What did you set as your launch parameters to ensure the correcr roCM/pytorch is used?

Yes, I'm using local install with shared venv. Install python

paru -S python-pip

Install PyTorch rocm from Pacman

paru -S python-pytorch-opt-rocm

If you have libmkl_intel_lp64.so.2 error when you run

python -c "import torch"

Install intel-mkl pacman package

paru -S intel-mkl

Install torchvision rocm from aur

paru -S python-torchvision-rocm

Than get project and create venv with your torch packages

git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
cd stable-diffusion-webui
python -m venv venv --system-site-packages
source venv/bin/activate
pip install -r requirements.txt

Run webui sh and wait

You will get a lot cuda xformers errors and etc but it works fine

Be careful with extensions

nuclear314 commented 11 months ago

For a fix for AMD cards, see my comment on a newer post: https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/14462#issuecomment-1872405104

Basically boils down to you need to install torch-directml, reinstall-torch and force directml usage.

soaska commented 11 months ago

For a fix for AMD cards, see my comment on a newer post: https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/14462#issuecomment-1872405104

Basically boils down to you need to install torch-directml, reinstall-torch and force directml usage.

And get rocm error. Nice

jaminW55 commented 9 months ago

So I have been reading it is an issue with ROCM and Pytorch, but that there is a version of Pytorch that you could use in a closed off venv that would fix the issue, at least until ROCM/AMD fixes their issue with Pytorch.

What version of Pytorch is last known to work? I'll try the workaround and report back.

jaminW55 commented 9 months ago

If u have same problem, congrats! Ur torch is broken. If u have amd gpu u should use special rocm version of PyTorch and torch vision. If u have it installed in ur system, use venv with system site packages. Hope it will work.

Maybe it can work different on windows cuz this os is weird, I don't know how it works

Hello. So you are doing local install and running a local venv environment, not using the AUR? What did you set as your launch parameters to ensure the correcr roCM/pytorch is used?

Yes, I'm using local install with shared venv. Install python

paru -S python-pip

Install PyTorch rocm from Pacman

paru -S python-pytorch-opt-rocm

If you have libmkl_intel_lp64.so.2 error when you run

python -c "import torch"

Install intel-mkl pacman package

paru -S intel-mkl

Install torchvision rocm from aur

paru -S python-torchvision-rocm

Than get project and create venv with your torch packages

git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
cd stable-diffusion-webui
python -m venv venv --system-site-packages
source venv/bin/activate
pip install -r requirements.txt

Run webui sh and wait

You will get a lot cuda xformers errors and etc but it works fine

Be careful with extensions

I followed this, but to no avail. The inability to read my GPU still persists.

LiBoHanse commented 9 months ago

you have amdgpu but "torch_version": "2.1.1+cu121" how is that supposed to get your amdgpu working?

soaska commented 9 months ago

you have amdgpu but

"torch_version": "2.1.1+cu121"

how is that supposed to get your amdgpu working?

Because pytorch cuda + rocm works faster than pytorch rocm. Much faster. But for artificial intelligence purposes Nvidia is currently better.