Closed MRtecno98 closed 1 year ago
You need to downgrade HSA-ROCR to 5.3.0-2. (you can use the downgrade package https://aur.archlinux.org/packages/downgrade) arch4edu updated hsa-rocr without actually checking if it break something
Also, have the same issue running arch linux with amd 5600g and RX570 4gb and 32gb ram.
Manually downgraded to hsa-rocr 5.3.0-2 using makepkg -si and the pkgbuild for 5.3.0-2 since the downgrade package didn't have 5.3.0-2 as an option. But it did not work and segs faults around the 2 minute mark and the screen flashes black for an instance when it happens. Using the same Command Line Arguments as MRtecno98.
For context I followed this guide after the wiki instructions resulted in "Torch is not able to use GPU". Also tried this and this but they also din't work. On windows this works but its slow 5 minutes for a pictures and using cpu with stable-diffusion-webui takes around 8 minutes. So either there is bug or I missed something since I have been trying to get the amd gpu to work for several hours now with different install methods and all the steps are jumbled up in my mind.
Same happens to me too. Downgrading to hsa-rocr 5.3.0-2
did not help. Script hangs as soon as it's hit the Global Step: xxxxxx part.
No error message or anything. Kernel kills it after 30-60 seconds later.
Tried with both Arch4edu and pip methods. It used to work with Arch4edu a few updates before by the way so I'm guessing it's an issue with them updating something.
Not sure if it's worth mentioning but I tried both Stable Diffusion's model and Waifu Diffusion's model. None works.
AFAIK you still need PyTorch packages specifically compiled to work with gfx803. https://github.com/xuhuisheng/rocm-gfx803. Drawback is these are only for Python 3.8. They're what work for me on my RX 570.
Trying rabidcopy suggestion, following these instruction on a fresh ubuntu 20.04.05 install fails with
OSError: libmpi_cxx.so.40: cannot open shared object file: No such file or directory
Looking at the repo issues there seems to be a solution provided by tmpuserx however stable diffusion still does not work and crashes with
Python 3.8.10 (default, Nov 14 2022, 12:59:47)
[GCC 9.4.0]
Commit hash: 44c46f0ed395967cd3830dd481a2db759fda5b3b
Traceback (most recent call last):
File "launch.py", line 294, in <module>
prepare_enviroment()
File "launch.py", line 209, in prepare_enviroment
run_python("import torch; assert torch.cuda.is_available(), 'Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check'")
File "launch.py", line 73, in run_python
return run(f'"{python}" -c "{code}"', desc, errdesc)
File "launch.py", line 49, in run
raise RuntimeError(message)
RuntimeError: Error running command.
Command: "/home/jose/Downloads/stable-diffusion-webui/venv/bin/python3" -c "import torch; assert torch.cuda.is_available(), 'Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check'"
Error code: 134
stdout: <empty>
stderr: "hipErrorNoBinaryForGpu: Unable to find code object for all current devices!"
Aborted (core dumped)
For some reason using ROC_ENABLE_PRE_VEGA=1 and HSA_OVERRIDE_GFX_VERSION=10.3.0 doesn't seem to work.
@thesandwichman294 Are you entering the following in your shell before running the app?
export LB_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/rocm/hip/lib
This temporarily sets the environment variable LB_LIBRARY_PATH which tells the OS where to find additional libs like libmpi.
It seems that this is a problem with the official PyTorch build: it is built without support for GFX803 (to which the RX 580 belongs). I use Arch Linux and PyTorch from official repositories built with GFX803 support. It is built for Python 3.11, but it worked without problems in my case. So if you use Arch Linux, you can install PyTorch from Arch Linux official repositories instead of installing a build from PyTorch developers.
If you have CPU without AVX2 support:
# pacman -S python-pytorch-rocm
If you have CPU with AVX2 support:
# pacman -S python-pytorch-opt-rocm
and TorchVision:
# pacman -S python-torchvision
# pacman -S virtualenv
$ cd stable-diffusion-webui/
$ rm -rf venv
$ virtualenv --system-site-packages venv
webui-user.sh
to disable installation of PyTorch by webui.sh
:
export TORCH_COMMAND="pip"
webui.sh
.Using rocm version 5.5.0 fixed segfault for me (RX 580):
TORCH_COMMAND='pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/rocm5.5.0'
python3 launch.py --precision full --no-half --opt-sub-quad-attention --lowvram --disable-nan-check --skip-torch-cuda-test
In this case webui.sh does not need to be touched
Is there an existing issue for this?
What happened?
I installed the HIP/ROCm stack on a fresh installation of Manjaro(using binaries from arch4edu) and rocminfo correctly recognizes my GPU(and CPU), when running the webui it doesn't complain about any missing gpu or cuda support, until it tries to load a model where it segfaults
Steps to reproduce the problem
What should have happened?
well, not segfault?
Commit where the problem happens
3596af07493ab7981ef92074f979eeee8fa624c4
What platforms do you use to access UI ?
Linux
What browsers do you use to access the UI ?
Google Chrome
Command Line Arguments
Additional information, context and logs
rocminfo output:
Webui log before it segfaults:
(my system is in italian, "Scrivania" in the directory names means Desktop)
I found the env flags i put in the commandline from various issues on github, without
HSA_OVERRIDE_GFX_VERSION=10.3.0
torch doesn't even recognize that cuda is available