Open jaminW55 opened 11 months ago
On my original install, AMD GPU was utilized just fine, maybe 2-3 weeks ago too. I have python-pytorch-opt-rocm. Package python-torchvision-rocm doesnt compile, having this error:
cc1plus: предупреждение: command-line option «-Wno-duplicate-decl-specifier» is valid for C/ObjC but not for C++
In file included from /home/sosiska/.cache/paru/clone/python-torchvision-rocm/src/vision-0.16.2/torchvision/csrc/vision.cpp:1:
/home/sosiska/.cache/paru/clone/python-torchvision-rocm/src/vision-0.16.2/torchvision/csrc/vision.h:10:40: предупреждение: «extern» декларация «_register_ops» с инициализацией
10 | extern "C" VISION_INLINE_VARIABLE auto _register_ops = &cuda_version;
| ^~~~~~~~~~~~~
[100%] Linking CXX shared library libtorchvision.so
[100%] Built target torchvision
Traceback (most recent call last):
File "/home/sosiska/.cache/paru/clone/python-torchvision-rocm/src/vision-0.16.2/setup.py", line 9, in <module>
import torch
File "/usr/lib/python3.11/site-packages/torch/__init__.py", line 234, in <module>
_load_global_deps()
File "/usr/lib/python3.11/site-packages/torch/__init__.py", line 193, in _load_global_deps
raise err
File "/usr/lib/python3.11/site-packages/torch/__init__.py", line 174, in _load_global_deps
ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
File "/usr/lib/python3.11/ctypes/__init__.py", line 376, in __init__
self._handle = _dlopen(self._name, mode)
^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: libmkl_intel_lp64.so.2: cannot open shared object file: No such file or directory
==> ОШИБКА: Произошел сбой в build().
Прерывание...
ошибка: не удалось собрать «python-torchvision-rocm-0.16.2-1»:
ошибка: не удалось собрать пакеты: python-torchvision-rocm-0.16.2-1
When im trying to import torch:
python -c "import torch"
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/lib/python3.11/site-packages/torch/__init__.py", line 234, in <module>
_load_global_deps()
File "/usr/lib/python3.11/site-packages/torch/__init__.py", line 193, in _load_global_deps
raise err
File "/usr/lib/python3.11/site-packages/torch/__init__.py", line 174, in _load_global_deps
ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
File "/usr/lib/python3.11/ctypes/__init__.py", line 376, in __init__
self._handle = _dlopen(self._name, mode)
^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: libmkl_intel_lp64.so.2: cannot open shared object file: No such file or directory
My system info:
export | grep -E "(GPU_TARGETS|AMDGPU_TARGETS|PYTORCH_ROCM_ARCH|HSA_OVERRIDE_GFX_VERSION)"
Name: AMD Ryzen 5 3600X 6-Core Processor
Marketing Name: AMD Ryzen 5 3600X 6-Core Processor
Vendor Name: CPU
Chip ID: 0(0x0)
BDFID: 0
Internal Node ID: 0
Name: gfx1030
Marketing Name: AMD Radeon RX 6900 XT
Vendor Name: AMD
Chip ID: 29631(0x73bf)
BDFID: 2560
Internal Node ID: 1
Name: amdgcn-amd-amdhsa--gfx1030
If u have same problem, congrats! Ur torch is broken. If u have amd gpu u should use special rocm version of PyTorch and torch vision. If u have it installed in ur system, use venv with system site packages. Hope it will work.
Maybe it can work different on windows cuz this os is weird, I don't know how it works
If u have same problem, congrats! Ur torch is broken. If u have amd gpu u should use special rocm version of PyTorch and torch vision. If u have it installed in ur system, use venv with system site packages. Hope it will work.
Maybe it can work different on windows cuz this os is weird, I don't know how it works
Hello. So you are doing local install and running a local venv environment, not using the AUR? What did you set as your launch parameters to ensure the correcr roCM/pytorch is used?
Checklist
- [x] The issue exists after disabling all extensions
- [x] The issue exists on a clean installation of webui
- [x] The issue is caused by an extension, but I believe it is caused by a bug in the webui
- [x] The issue exists in the current version of the webui
- [x] The issue has not been reported before recently
- [x] The issue has been reported before but has not been fixed yet
What happened?
Whenever running a prompt, the program does not use my AMD GPU or rOCM. i have checked all drivers and components are up to date, and regardless additional -args nothing fixes this issue.
Steps to reproduce the problem
- Install Program on AMD GPU Computer
- Run Program (enter any associated arguments)
- Run a prompt
What should have happened?
This should be using my AMD GPU to generate images, but it is not. I have monitored with rocm-smi, and verified this is the case. On my original install, AMD GPU was utilized just fine, maybe 2-3 weeks ago.
What browsers do you use to access the UI ?
Firefox
Sysinfo
Console logs
I have additional args in
webui-user.sh
:export COMMANDLINE_ARGS="--skip-torch-cuda-test --precision full --no-half"
ᐉ stable-diffusion-web-ui-server ################################################################ Install script for stable-diffusion + Web UI Tested on Debian 11 (Bullseye) ################################################################ ################################################################ Running on jaminW55 user ################################################################ ################################################################ Repo already cloned, using it as install directory ################################################################ ################################################################ Create and activate python venv ################################################################ ################################################################ Launching launch.py... ################################################################ Using TCMalloc: libtcmalloc_minimal.so.4 Python 3.11.6 (main, Nov 14 2023, 09:36:21) [GCC 13.2.1 20230801] Version: v1.6.1 Commit hash: 4afaaf8a020c1df457bcf7250cb1c7f609699fa7 Launching Web UI with arguments: --skip-torch-cuda-test --precision full --no-half /opt/stable-diffusion-web-ui/venv/lib/python3.11/site-packages/torch/cuda/__init__.py:611: UserWarning: Can't initialize NVML warnings.warn("Can't initialize NVML") WARNING:xformers:WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for: PyTorch 2.0.1+cu118 with CUDA 1108 (you have 2.1.1+cu121) Python 3.11.3 (you have 3.11.6) Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers) Memory-efficient attention, SwiGLU, sparse and more won't be available. Set XFORMERS_MORE_DETAILS=1 for more details No module 'xformers'. Proceeding without it. Warning: caught exception 'No CUDA GPUs are available', memory monitor disabled Loading weights [676f0d60c8] from /opt/stable-diffusion-web-ui/models/Stable-diffusion/dreamshaperXL_turboDpmppSDE.safetensors Running on local URL: http://127.0.0.1:7860 To create a public link, set `share=True` in `launch()`. Startup time: 3.9s (import torch: 1.1s, import gradio: 0.3s, setup paths: 1.4s, other imports: 0.3s, load scripts: 0.1s, create ui: 0.2s, gradio launch: 0.3s). Creating model from config: /opt/stable-diffusion-web-ui/repositories/generative-models/configs/inference/sd_xl_base.yaml Applying attention optimization: InvokeAI... done. Model loaded in 2.5s (load weights from disk: 0.6s, create model: 0.2s, apply weights to model: 0.8s, apply float(): 0.6s, calculate empty prompt: 0.2s).
Additional information
All drivers and rOCM are up to date.
Seems you got wrong torch version. Just do this on terminal at SD webui folder:
python3 -m venv venv source venv/bin/activate pip uninstall torch torchvision torchaudio pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm5.7
create a file below with text editor :
source venv/bin/activate
export HSA_OVERRIDE_GFX_VERSION=10.3.0
export HIP_VISIBLE_DEVICES=0
export PYTORCH_HIP_ALLOC_CONF=garbage_collection_threshold:0.8,max_split_size_mb:512
python3 launch.py --enable-insecure-extension-access --opt-sdp-attention
save that text as launch.sh you need to edit " HSA_OVERRIDE_GFX_VERSION=10.3.0 " to your exact AMD GFX version, in my case since i'm using RX 6800 (RDNA2), it's 10.3.0
after that, open terminal in your webui folder, just do: bash launch.sh
that's all
If u have same problem, congrats! Ur torch is broken. If u have amd gpu u should use special rocm version of PyTorch and torch vision. If u have it installed in ur system, use venv with system site packages. Hope it will work.
Maybe it can work different on windows cuz this os is weird, I don't know how it works
Hello. So you are doing local install and running a local venv environment, not using the AUR? What did you set as your launch parameters to ensure the correcr roCM/pytorch is used?
Yes, I'm using local install with shared venv. Install python
paru -S python-pip
Install PyTorch rocm from Pacman
paru -S python-pytorch-opt-rocm
If you have libmkl_intel_lp64.so.2
error when you run
python -c "import torch"
Install intel-mkl
pacman package
paru -S intel-mkl
Install torchvision rocm from aur
paru -S python-torchvision-rocm
Than get project and create venv with your torch packages
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
cd stable-diffusion-webui
python -m venv venv --system-site-packages
source venv/bin/activate
pip install -r requirements.txt
Run webui sh and wait
You will get a lot cuda xformers errors and etc but it works fine
Be careful with extensions
For a fix for AMD cards, see my comment on a newer post: https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/14462#issuecomment-1872405104
Basically boils down to you need to install torch-directml, reinstall-torch and force directml usage.
For a fix for AMD cards, see my comment on a newer post: https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/14462#issuecomment-1872405104
Basically boils down to you need to install torch-directml, reinstall-torch and force directml usage.
And get rocm error. Nice
So I have been reading it is an issue with ROCM and Pytorch, but that there is a version of Pytorch that you could use in a closed off venv that would fix the issue, at least until ROCM/AMD fixes their issue with Pytorch.
What version of Pytorch is last known to work? I'll try the workaround and report back.
If u have same problem, congrats! Ur torch is broken. If u have amd gpu u should use special rocm version of PyTorch and torch vision. If u have it installed in ur system, use venv with system site packages. Hope it will work.
Maybe it can work different on windows cuz this os is weird, I don't know how it works
Hello. So you are doing local install and running a local venv environment, not using the AUR? What did you set as your launch parameters to ensure the correcr roCM/pytorch is used?
Yes, I'm using local install with shared venv. Install python
paru -S python-pip
Install PyTorch rocm from Pacman
paru -S python-pytorch-opt-rocm
If you have
libmkl_intel_lp64.so.2
error when you runpython -c "import torch"
Install
intel-mkl
pacman packageparu -S intel-mkl
Install torchvision rocm from aur
paru -S python-torchvision-rocm
Than get project and create venv with your torch packages
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git cd stable-diffusion-webui python -m venv venv --system-site-packages source venv/bin/activate pip install -r requirements.txt
Run webui sh and wait
You will get a lot cuda xformers errors and etc but it works fine
Be careful with extensions
I followed this, but to no avail. The inability to read my GPU still persists.
you have amdgpu but
"torch_version": "2.1.1+cu121"
how is that supposed to get your amdgpu working?
you have amdgpu but
"torch_version": "2.1.1+cu121"
how is that supposed to get your amdgpu working?
Because pytorch cuda + rocm works faster than pytorch rocm. Much faster. But for artificial intelligence purposes Nvidia is currently better.
Checklist
What happened?
Whenever running a prompt, the program does not use my AMD GPU or rOCM. i have checked all drivers and components are up to date, and regardless additional -args nothing fixes this issue.
Steps to reproduce the problem
What should have happened?
This should be using my AMD GPU to generate images, but it is not. I have monitored with rocm-smi, and verified this is the case. On my original install, AMD GPU was utilized just fine, maybe 2-3 weeks ago.
What browsers do you use to access the UI ?
Firefox
Sysinfo
sysinfo-2023-12-20-19-53.txt
Console logs
I have additional args in
webui-user.sh
:Additional information
All drivers and rOCM are up to date.