[Bug]: Image generation won't start forever (Linux+ROCm, possibly specific to RX 5000 series)

cyatarow commented 1 year ago

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits

What happened?

I have newly installed v1.3.0, but image generation won't start even after many minutes of pressing "Generate" button.

Steps to reproduce the problem

Launch the UI by webui.sh
Go to http://127.0.0.1:7860 with a browser
Press "Generate" for any prompt or model

What should have happened?

Image generation should have started.

Commit where the problem happens

20ae71faa8ef035c31aa3a410b707d792c8203a3

What Python version are you running on ?

Python 3.10.x

What platforms do you use to access the UI ?

Linux

What device are you running WebUI on?

AMD GPUs (RX 5000 below)

What browsers do you use to access the UI ?

Mozilla Firefox

Command Line Arguments

`--ckpt-dir` and `--vae-dir`
I'm using external storage to place model files.

List of extensions

(None)

Console logs

################################################################
Install script for stable-diffusion + Web UI
Tested on Debian 11 (Bullseye)
################################################################

################################################################
Running on sd-amd user
################################################################

################################################################
Repo already cloned, using it as install directory
################################################################

################################################################
Create and activate python venv
################################################################

################################################################
Launching launch.py...
################################################################
Using TCMalloc: libtcmalloc.so.4
Python 3.10.6 (main, Mar 10 2023, 10:55:28) [GCC 11.3.0]
Version: v1.3.0
Commit hash: 20ae71faa8ef035c31aa3a410b707d792c8203a3
Installing torch and torchvision
Looking in indexes: https://download.pytorch.org/whl/rocm5.4.2
Collecting torch==2.0.1+rocm5.4.2
  Using cached https://download.pytorch.org/whl/rocm5.4.2/torch-2.0.1%2Brocm5.4.2-cp310-cp310-linux_x86_64.whl (1536.4 MB)
Collecting torchvision==0.15.2+rocm5.4.2
  Using cached https://download.pytorch.org/whl/rocm5.4.2/torchvision-0.15.2%2Brocm5.4.2-cp310-cp310-linux_x86_64.whl (62.4 MB)
Collecting filelock
  Using cached https://download.pytorch.org/whl/filelock-3.9.0-py3-none-any.whl (9.7 kB)
Collecting networkx
  Using cached https://download.pytorch.org/whl/networkx-3.0-py3-none-any.whl (2.0 MB)
Collecting sympy
  Using cached https://download.pytorch.org/whl/sympy-1.11.1-py3-none-any.whl (6.5 MB)
Collecting pytorch-triton-rocm<2.1,>=2.0.0
  Using cached https://download.pytorch.org/whl/pytorch_triton_rocm-2.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (78.4 MB)
Collecting jinja2
  Using cached https://download.pytorch.org/whl/Jinja2-3.1.2-py3-none-any.whl (133 kB)
Collecting typing-extensions
  Using cached https://download.pytorch.org/whl/typing_extensions-4.4.0-py3-none-any.whl (26 kB)
Collecting requests
  Using cached https://download.pytorch.org/whl/requests-2.28.1-py3-none-any.whl (62 kB)
Collecting pillow!=8.3.*,>=5.3.0
  Using cached https://download.pytorch.org/whl/Pillow-9.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.2 MB)
Collecting numpy
  Using cached https://download.pytorch.org/whl/numpy-1.24.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.3 MB)
Collecting cmake
  Using cached https://download.pytorch.org/whl/cmake-3.25.0-py2.py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (23.7 MB)
Collecting lit
  Using cached https://download.pytorch.org/whl/lit-15.0.7.tar.gz (132 kB)
  Preparing metadata (setup.py) ... done
Collecting MarkupSafe>=2.0
  Using cached https://download.pytorch.org/whl/MarkupSafe-2.1.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (25 kB)
Collecting certifi>=2017.4.17
  Using cached https://download.pytorch.org/whl/certifi-2022.12.7-py3-none-any.whl (155 kB)
Collecting idna<4,>=2.5
  Using cached https://download.pytorch.org/whl/idna-3.4-py3-none-any.whl (61 kB)
Collecting urllib3<1.27,>=1.21.1
  Using cached https://download.pytorch.org/whl/urllib3-1.26.13-py2.py3-none-any.whl (140 kB)
Collecting charset-normalizer<3,>=2
  Using cached https://download.pytorch.org/whl/charset_normalizer-2.1.1-py3-none-any.whl (39 kB)
Collecting mpmath>=0.19
  Using cached https://download.pytorch.org/whl/mpmath-1.2.1-py3-none-any.whl (532 kB)
Using legacy 'setup.py install' for lit, since package 'wheel' is not installed.
Installing collected packages: mpmath, lit, cmake, urllib3, typing-extensions, sympy, pillow, numpy, networkx, MarkupSafe, idna, filelock, charset-normalizer, certifi, requests, jinja2, pytorch-triton-rocm, torch, torchvision
  Running setup.py install for lit ... done
Successfully installed MarkupSafe-2.1.2 certifi-2022.12.7 charset-normalizer-2.1.1 cmake-3.25.0 filelock-3.9.0 idna-3.4 jinja2-3.1.2 lit-15.0.7 mpmath-1.2.1 networkx-3.0 numpy-1.24.1 pillow-9.3.0 pytorch-triton-rocm-2.0.1 requests-2.28.1 sympy-1.11.1 torch-2.0.1+rocm5.4.2 torchvision-0.15.2+rocm5.4.2 typing-extensions-4.4.0 urllib3-1.26.13
Installing gfpgan
Installing clip
Installing open_clip
Cloning Stable Diffusion into /home/sd-amd/sd-ui-130/stable-diffusion-webui/repositories/stable-diffusion-stability-ai...
Cloning Taming Transformers into /home/sd-amd/sd-ui-130/stable-diffusion-webui/repositories/taming-transformers...
Cloning K-diffusion into /home/sd-amd/sd-ui-130/stable-diffusion-webui/repositories/k-diffusion...
Cloning CodeFormer into /home/sd-amd/sd-ui-130/stable-diffusion-webui/repositories/CodeFormer...
Cloning BLIP into /home/sd-amd/sd-ui-130/stable-diffusion-webui/repositories/BLIP...
Installing requirements for CodeFormer
Installing requirements
Launching Web UI with arguments: --ckpt-dir /mnt/W20/Stable_Diffusion/MODEL --vae-dir /mnt/W20/Stable_Diffusion/VAE
No module 'xformers'. Proceeding without it.
Calculating sha256 for /mnt/W20/Stable_Diffusion/MODEL/AnythingV5_v5PrtRE.safetensors: Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 2.7s (import torch: 0.5s, import gradio: 0.6s, import ldm: 0.6s, other imports: 0.4s, load scripts: 0.3s, create ui: 0.2s).
7f96a1a9ca9b3a3242a9ae95d19284f0d2da8d5282b42d2d974398bf7663a252
Loading weights [7f96a1a9ca] from /mnt/W20/Stable_Diffusion/MODEL/AnythingV5_v5PrtRE.safetensors
Creating model from config: /home/sd-amd/sd-ui-130/stable-diffusion-webui/configs/v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Applying optimization: sdp-no-mem... done.
Textual inversion embeddings loaded(0): 
Model loaded in 4.0s (calculate hash: 2.7s, create model: 0.3s, apply weights to model: 0.3s, apply half(): 0.2s, load VAE: 0.1s, move model to device: 0.2s).

Additional information

My environment:

OS: Ubuntu 22.04.2
CPU: Intel Core i3-12100
GPU: AMD Radeon RX 5500 XT (8GB)

olinorwell commented 1 year ago

I have exactly the same issue, used to work perfectly before.

Like you say, it just sits there and doesn't do anything, no errors anywhere.

I've uninstalled/reinstalled everything and tried various different combinations, no good.

Previously I would get the classic: "MIOpen(HIP): Warning [SQLiteBase] Missing system database file: gfx1030_40.kdb Performance may degrade. " warning, but then after about a minute it would start working and then work correctly. Now I don't get that warning, suggesting that might be the point that it falters.

I'm using an AMD Radeon RX 5700 XT (8GB), Ryzen 3700 CPU, Arch Linux. So similar to you but not exactly the same.

Fingers crossed somebody can suggest something! Previously on this system I've had SD working well through all the updates from September last year to a couple of weeks ago.

HoCoK31 commented 1 year ago

Same issue, no errors, just not generating anything AMD Radeon RX 5700 XT, Ryzen 3600, Manjaro, kernel 6.3.4-2

cyatarow commented 1 year ago

Could it be this problem is specific to RX 5000 series?

olinorwell commented 1 year ago

I fear it might be related to the fact that the 5000 series wasn't supposed to work originally, but then we got a workaround to do with 'fooling something' into believing it was a different chip, after which it then worked. Perhaps that trick isn't working now, and it's just unable to function. There must be many others in the same situation out there. Hopefully they will all comment on this post.

olinorwell commented 1 year ago

To confirm to anyone trying to help - at least in my case it used to immediately give the warning: "MIOpen(HIP): Warning [SQLiteBase] Missing system database file: gfx1030_40.kdb Performance may degrade."

This no longer happens. So whatever is different is after the Generate button is hit, and before the warning would be outputted.

[Edit: Additionally, I ran the tests for PyTorch found here - https://pytorch.org/get-started/locally/ suggesting that PyTorch RocM is working as expected]

[Edit 2: Not sure if it's useful to know, but I did recently install OpenCL on my machine, I was reading that OpenCL/HIP backends are potentially not compatible side-by-side when using RocM. I don't fully understand all of this but my gut feeling is it could be something to do with that - but then, maybe others haven't recently installed OpenCL]

cyatarow commented 1 year ago

In fact, inspired by this PR, I had tried the dev branch shortly before v1.3.0 was released. But the result was the same...

The participants in the PR were only RX 6000 users, and I think the merge was forced without decent verification with 5000 series.

olinorwell commented 1 year ago

I agree, I fear that change is what has broken it for RX 5000 users. According to that PR it was needed due to old versions not being available on the pytorch repos. I wonder if they are still available elsewhere. I fear we're going to need the 1.3 version again, avoiding the 2.0 version which doesn't appear to work. It at times like this when I really get mad at myself for updating anything! It was all working so well.

VekuDazo commented 1 year ago

But I have the exact same issue on the 6600m gfx1031? with r7 5800h Without --medvram it doesnt proceed after - Applying optimization: sdp-no-mem... done. With it, the model loads but nothing generates and nothing else happens in the terminal

ethragur commented 1 year ago

Same here (RX 5700) with ROCm 5.5 The only solution for now is to force downgrade to torch 1.13.1 pip install torch==1.13.1 torchvision==0.14.1 --index-url https://download.pytorch.org/whl/rocm5.2

has anyone tried with a torch 2.0 build for ROCm version 5.5? For now the newest one in nightly is still 5.4.2 https://download.pytorch.org/whl/nightly/torch/

olinorwell commented 1 year ago

Even force downgrading was failing for me, I had instructions that had a '+rocm' next to the package versions? When I tried without it appeared to download the Nvidia versions.

What would be the way to try the 5.5 version? I can try that now.

ethragur commented 1 year ago

What would be the way to try the 5.5 version? I can try that now.

You would have to build pytorch yourself with the ROCm 5.5 version. Maybe something like #9591, the docker image they use does not exist anymore, but the one from the official pytorch docker repo could still work (https://hub.docker.com/r/rocm/pytorch/tags)

rocm/pytorch:rocm5.5_ubuntu20.04_py3.8_pytorch_staging

But I'm not really sure if that would make it work, even if we'd be able to compile it, maybe there is something that doesn't work in the new pytorch version with rx5X00 graphics cards.

Even force downgrading was failing for me, I had instructions that had a '+rocm' next to the package versions? When I tried without it appeared to download the Nvidia versions.

Maybe you had '--extra-index-url' instead of '--index-url'. You could also just go into your venv directory: stable-diffusion-webui/venv/lib/python3.10/site-packages and delete torch & torchvision. Afterwards you should just be able to use my pip install cmd.

Additionally I added the export TORCH_COMMAND= "pip install torch==1.13.1 torchvision==0.14.1 --index-url https://download.pytorch.org/whl/rocm5.2" to my webui-user.sh, and I started the webui with ./webui.sh

olinorwell commented 1 year ago

(venv) [oli@ARCH-RYZEN stable-diffusion-webui]$ pip install torch==1.13.1 torchvision==0.14.1 --index-url https://download.pytorch.org/whl/rocm5.2 Looking in indexes: https://download.pytorch.org/whl/rocm5.2 ERROR: Could not find a version that satisfies the requirement torch==1.13.1 (from versions: none) ERROR: No matching distribution found for torch==1.13.1

I wonder if the fact they bumped the Python version up to 3.11 makes a difference? I see you were running 3.10.

ethragur commented 1 year ago

I wonder if the fact they bumped the Python version up to 3.11 makes a difference? I see you were running 3.10.

https://download.pytorch.org/whl/rocm5.2/torch/ it looks like it, pytorch seems to only have builds for 3.10

olinorwell commented 1 year ago

I'm retrying now with 3.10. Fingers crossed.

ethragur commented 1 year ago

Otherwise you could try to download the .whl file and just install it directly with pip:

pip install /path/to/file.whl

olinorwell commented 1 year ago

Success! @ethragur is the hero, his solution has worked for me. I'm now running v1.3.0 of A1111 on my 5700XT.

My solution was this - ensure you have Python 3.10 and edit the webui.sh file to make sure it uses Python 3.10.

Run webui.sh and let it create the venv etc and then fail to create an image.

Run: source venv/bin/activate

Then run (thanks to @ethragur) pip install torch==1.13.1 torchvision==0.14.1 --index-url https://download.pytorch.org/whl/rocm5.2

Now restart webui.sh and this time image generation will succeed, you'll see at the bottom of A1111 that the version number says "torch: 1.13.1+rocm5.2".

Hopefully what has worked for me will work for others too, thanks again to @ethragur for the help - I was getting very down at not having SD to play with!

ethragur commented 1 year ago

Perfect, good to hear that it works again. Hopefully some future builds of pytorch will also work again with the rx5000 series, otherwise we'll be stuck on this version forever :cry:. From what I've seen, 2.0 should give some performance improvements.

I'll try building the new version in a docker container, and if it works I'll upload the .whl file somewhere. But I do not have high hopes. Maybe there is some way to get more debug information out of pytorch to see where it is stuck

cyatarow commented 1 year ago

Any contributors notice this issue?

cyatarow commented 1 year ago

v1.3.1, released yesterday, doesn't seem to have this fix... too bad.

cyatarow commented 1 year ago

@AUTOMATIC1111 please don't ignore us...

magusman52 commented 1 year ago

Same issue, 5700 XT both on torch 1.13.1 and 2.0. Oddly enough, I just borrowed this card today from a friend and managed to get a single gen in before this bug occured

EDIT: It started generating the entire prompt in a couple seconds, after waiting for 2 minutes. After that incident, my system became really sluggish. Prompts were generating again, but the speed was inconsistent

olinorwell commented 1 year ago

Same issue, 5700 XT both on torch 1.13.1 and 2.0. Oddly enough, I just borrowed this card today from a friend and managed to get a single gen in before this bug occured

EDIT: It started generating the entire prompt in a couple seconds, after waiting for 2 minutes. After that incident, my system became really sluggish. Prompts were generating again, but the speed was inconsistent

Is this Windows or Linux?

For me it was cut and dry, torch 2.0 doesn't work, torch 1.13.1 does. Perhaps check versions, etc? I always have a one minute delay before generations begin each time, but that's been like that since the beginning, and after it's done what it needs to do then I don't experience problems afterwards.

magusman52 commented 1 year ago

Same issue, 5700 XT both on torch 1.13.1 and 2.0. Oddly enough, I just borrowed this card today from a friend and managed to get a single gen in before this bug occured EDIT: It started generating the entire prompt in a couple seconds, after waiting for 2 minutes. After that incident, my system became really sluggish. Prompts were generating again, but the speed was inconsistent

Is this Windows or Linux?

For me it was cut and dry, torch 2.0 doesn't work, torch 1.13.1 does. Perhaps check versions, etc? I always have a one minute delay before generations begin each time, but that's been like that since the beginning, and after it's done what it needs to do then I don't experience problems afterwards.

I'm on Ubuntu 22.04. And yes it occurs with both versions of torch. Prompt loads for a minute or two, first 90% of the gen gets done in a couple seconds, gets stuck at 97% again for a while, and then finished the prompt. Also my system seems to get really unstable after prompting, as if it's about to crash or blackscreen. Quite odd.

EDIT: Tested again, now it only occurs on torch 2.0. Works alright on 1.13.1 besides the initial lag.

DGdev91 commented 1 year ago

I made a PR to force pytorch 1.13.1 for RX 5000 cards. also checks for python <= 3.10 Not a definitive fix, but maybe it can help other users

https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/11048

cyatarow commented 1 year ago

But still, why is only RX 5000 series soooo incompatible with torch 2.0??

DGdev91 commented 1 year ago

But still, why is only RX 5000 series soooo incompatible with torch 2.0??

That's a good question. My first guess is that we need to force HSA_OVERRIDE_GFX_VERSION to make it work, but that's also trie for RX 6000, wich is working just fine.

Sooo.... Who knows.

We can't even be really sure it's just RX 5000, maybe there are other series wich have problems but no one has reported it yet

olinorwell commented 1 year ago

HSA_OVERRIDE_GFX_VERSION is already forced though in the script for those cards - it was set correctly for me even when things weren't working. Perhaps Torch v2.0 needs a further workaround or something.

I just hope code doesn't slip into the repo that's only torch 2.0 compatible, then we're in trouble.

magusman52 commented 1 year ago

But still, why is only RX 5000 series soooo incompatible with torch 2.0??

That's a good question. My first guess is that we need to force HSA_OVERRIDE_GFX_VERSION to make it work, but that's also trie for RX 6000, wich is working just fine.

Sooo.... Who knows.

HSA_OVERRIDE_GFX_VERSION is already enabled by default in webui.sh since a couple releases I think,

We can't even be really sure it's just RX 5000, maybe there are other series wich have problems but no one has reported it yet

Before this card, I ran SD on a RX 580 4GB which was a nightmare to get running. It didn't have this specific issue, but plenty of others problems that all boiled down to ROCm support.

DGdev91 commented 1 year ago

HSA_OVERRIDE_GFX_VERSION is already forced though in the script for those cards - it was set correctly for me even when things weren't working. Perhaps Torch v2.0 needs a further workaround or something.

Yes, exactly. What i meant was that my first guess was about the HSA_OVERRIDE_GFX_VERSION causing problems, but that can't be because also the 6000 series uses that without issues.

magusman52 commented 1 year ago

Just out of curiosity, would there be any significant increase in performance on torch 2.0? Would be interesting to see someone on torch 2.0 with a 5700XT upload a benchmark, to compare to 1.13.1

DGdev91 commented 1 year ago

Just out of curiosity, would there be any significant increase in performance on torch 2.0? Would be interesting to see someone on torch 2.0 with a 5700XT upload a benchmark, to compare to 1.13.1

It surely would, if we can manage to run it. Specially using --opt-sdp-attention

On AMD we can't use xformers, and that option would surely be a huge boost

cyatarow commented 1 year ago

As far as I can tell, ROCM does not support RDNA1/Navi1.x cards.

Really? So, was it wrong of me to buy an RX 5000 GPU? And should I sell it right now??

olinorwell commented 1 year ago

Related reports: #9951 (comment) #9951 (comment)

As far as I can tell, ROCM does not support RDNA1/Navi1.x cards.

Really? So, was it wrong of me to buy an RX 5000 GPU? And should I sell it right now??

I believe it doesn't officially, but with the special override define it allows it to work. I'm using RocM 5.2 on a Navi1.x card.

DGdev91 commented 1 year ago

As far as I can tell, ROCM does not Really? So, was it wrong of me to buy an RX 5000 GPU? And should I sell it right now??

No, it can still work with an older PyTorch and that override.

And technically ROCm doesn't officially supports any consumer-grade video card. Even if they work just fine with it.

cyatarow commented 1 year ago

The PR #11048 was merged into dev and release_candidate branches. But...is there really no way to work around the issue other than fixing torch to 1.13.1? Could it be that since RDNA1 is not officially supported by ROCm, torch 2.0 was developed without any consideration of RDNA1??

ethragur commented 1 year ago

Tried it with the new rocm5.5 torch release build in the pytorch nightly repo. The same problem is still present ...

fighuass commented 1 year ago

Can confirm that I have this issue too with my RX 5700 XT. Starting to regret ever buying that GPU, tbh..

Everything worked fine last time I was into using SD, sometime last year or so.

k1llerk3ks commented 1 year ago

I still have this issue with RX 5700 XT. Downgrade to 1.13.1 worked for me, although there is this delay at the beginning of picture creation. I cannot use the sd-xl-base checkpoint with it though... please @AUTOMATIC1111 fix this...

DGdev91 commented 1 year ago

I still have this issue with RX 5700 XT. Downgrade to 1.13.1 worked for me, although there is this delay at the beginning of picture creation. I cannot use the sd-xl-base checkpoint with it though... please @AUTOMATIC1111 fix this...

That probably isn't something related to the Web UI, it's an issue in pytorch itself. Or maybe in ROCm. I'm starting to think the problem here is rocm, because i had issues also in llama.cpp, both with clblast and with a forks wich aims to add rocm support.

Anyway, i found this on pytorch's github, probably related https://github.com/pytorch/pytorch/issues/106728

cl0ck-byte commented 1 year ago

Anyway, i found this on pytorch's github, probably related pytorch/pytorch#106728

Indeed related, torch>=2.0.0 won't run on RDNA1 for now, even with torch wheel targeting gfx1010 which is my card in this case.

edit: wow, this is worthless!

DGdev91 commented 6 months ago

I found some time ago an old pytorch 2.0 build wich runs on RX5000 https://github.com/pytorch/pytorch/issues/106728#issuecomment-1749511711

AUTOMATIC1111 / stable-diffusion-webui