Open cyatarow opened 1 year ago
I have exactly the same issue, used to work perfectly before.
Like you say, it just sits there and doesn't do anything, no errors anywhere.
I've uninstalled/reinstalled everything and tried various different combinations, no good.
Previously I would get the classic: "MIOpen(HIP): Warning [SQLiteBase] Missing system database file: gfx1030_40.kdb Performance may degrade. " warning, but then after about a minute it would start working and then work correctly. Now I don't get that warning, suggesting that might be the point that it falters.
I'm using an AMD Radeon RX 5700 XT (8GB), Ryzen 3700 CPU, Arch Linux. So similar to you but not exactly the same.
Fingers crossed somebody can suggest something! Previously on this system I've had SD working well through all the updates from September last year to a couple of weeks ago.
Same issue, no errors, just not generating anything AMD Radeon RX 5700 XT, Ryzen 3600, Manjaro, kernel 6.3.4-2
Could it be this problem is specific to RX 5000 series?
I fear it might be related to the fact that the 5000 series wasn't supposed to work originally, but then we got a workaround to do with 'fooling something' into believing it was a different chip, after which it then worked. Perhaps that trick isn't working now, and it's just unable to function. There must be many others in the same situation out there. Hopefully they will all comment on this post.
To confirm to anyone trying to help - at least in my case it used to immediately give the warning: "MIOpen(HIP): Warning [SQLiteBase] Missing system database file: gfx1030_40.kdb Performance may degrade."
This no longer happens. So whatever is different is after the Generate button is hit, and before the warning would be outputted.
[Edit: Additionally, I ran the tests for PyTorch found here - https://pytorch.org/get-started/locally/ suggesting that PyTorch RocM is working as expected]
[Edit 2: Not sure if it's useful to know, but I did recently install OpenCL on my machine, I was reading that OpenCL/HIP backends are potentially not compatible side-by-side when using RocM. I don't fully understand all of this but my gut feeling is it could be something to do with that - but then, maybe others haven't recently installed OpenCL]
In fact, inspired by this PR, I had tried the dev branch shortly before v1.3.0 was released. But the result was the same...
The participants in the PR were only RX 6000 users, and I think the merge was forced without decent verification with 5000 series.
I agree, I fear that change is what has broken it for RX 5000 users. According to that PR it was needed due to old versions not being available on the pytorch repos. I wonder if they are still available elsewhere. I fear we're going to need the 1.3 version again, avoiding the 2.0 version which doesn't appear to work. It at times like this when I really get mad at myself for updating anything! It was all working so well.
But I have the exact same issue on the 6600m gfx1031? with r7 5800h Without --medvram it doesnt proceed after - Applying optimization: sdp-no-mem... done. With it, the model loads but nothing generates and nothing else happens in the terminal
Same here (RX 5700) with ROCm 5.5
The only solution for now is to force downgrade to torch 1.13.1
pip install torch==1.13.1 torchvision==0.14.1 --index-url https://download.pytorch.org/whl/rocm5.2
has anyone tried with a torch 2.0 build for ROCm version 5.5? For now the newest one in nightly is still 5.4.2 https://download.pytorch.org/whl/nightly/torch/
Even force downgrading was failing for me, I had instructions that had a '+rocm' next to the package versions? When I tried without it appeared to download the Nvidia versions.
What would be the way to try the 5.5 version? I can try that now.
What would be the way to try the 5.5 version? I can try that now.
You would have to build pytorch yourself with the ROCm 5.5 version. Maybe something like #9591, the docker image they use does not exist anymore, but the one from the official pytorch docker repo could still work (https://hub.docker.com/r/rocm/pytorch/tags)
rocm/pytorch:rocm5.5_ubuntu20.04_py3.8_pytorch_staging
But I'm not really sure if that would make it work, even if we'd be able to compile it, maybe there is something that doesn't work in the new pytorch version with rx5X00 graphics cards.
Even force downgrading was failing for me, I had instructions that had a '+rocm' next to the package versions? When I tried without it appeared to download the Nvidia versions.
Maybe you had '--extra-index-url' instead of '--index-url'. You could also just go into your venv directory:
stable-diffusion-webui/venv/lib/python3.10/site-packages
and delete torch & torchvision. Afterwards you should just be able to use my pip install cmd.
Additionally I added the export TORCH_COMMAND= "pip install torch==1.13.1 torchvision==0.14.1 --index-url https://download.pytorch.org/whl/rocm5.2"
to my webui-user.sh, and I started the webui with ./webui.sh
(venv) [oli@ARCH-RYZEN stable-diffusion-webui]$ pip install torch==1.13.1 torchvision==0.14.1 --index-url https://download.pytorch.org/whl/rocm5.2 Looking in indexes: https://download.pytorch.org/whl/rocm5.2 ERROR: Could not find a version that satisfies the requirement torch==1.13.1 (from versions: none) ERROR: No matching distribution found for torch==1.13.1
I wonder if the fact they bumped the Python version up to 3.11 makes a difference? I see you were running 3.10.
I wonder if the fact they bumped the Python version up to 3.11 makes a difference? I see you were running 3.10.
https://download.pytorch.org/whl/rocm5.2/torch/ it looks like it, pytorch seems to only have builds for 3.10
I'm retrying now with 3.10. Fingers crossed.
Otherwise you could try to download the .whl file and just install it directly with pip:
pip install /path/to/file.whl
Success! @ethragur is the hero, his solution has worked for me. I'm now running v1.3.0 of A1111 on my 5700XT.
My solution was this - ensure you have Python 3.10 and edit the webui.sh file to make sure it uses Python 3.10.
Run webui.sh and let it create the venv etc and then fail to create an image.
Run:
source venv/bin/activate
Then run (thanks to @ethragur)
pip install torch==1.13.1 torchvision==0.14.1 --index-url https://download.pytorch.org/whl/rocm5.2
Now restart webui.sh and this time image generation will succeed, you'll see at the bottom of A1111 that the version number says "torch: 1.13.1+rocm5.2".
Hopefully what has worked for me will work for others too, thanks again to @ethragur for the help - I was getting very down at not having SD to play with!
Perfect, good to hear that it works again. Hopefully some future builds of pytorch will also work again with the rx5000 series, otherwise we'll be stuck on this version forever :cry:. From what I've seen, 2.0 should give some performance improvements.
I'll try building the new version in a docker container, and if it works I'll upload the .whl file somewhere. But I do not have high hopes. Maybe there is some way to get more debug information out of pytorch to see where it is stuck
Any contributors notice this issue?
v1.3.1, released yesterday, doesn't seem to have this fix... too bad.
@AUTOMATIC1111 please don't ignore us...
Same issue, 5700 XT both on torch 1.13.1 and 2.0. Oddly enough, I just borrowed this card today from a friend and managed to get a single gen in before this bug occured
EDIT: It started generating the entire prompt in a couple seconds, after waiting for 2 minutes. After that incident, my system became really sluggish. Prompts were generating again, but the speed was inconsistent
Same issue, 5700 XT both on torch 1.13.1 and 2.0. Oddly enough, I just borrowed this card today from a friend and managed to get a single gen in before this bug occured
EDIT: It started generating the entire prompt in a couple seconds, after waiting for 2 minutes. After that incident, my system became really sluggish. Prompts were generating again, but the speed was inconsistent
Is this Windows or Linux?
For me it was cut and dry, torch 2.0 doesn't work, torch 1.13.1 does. Perhaps check versions, etc? I always have a one minute delay before generations begin each time, but that's been like that since the beginning, and after it's done what it needs to do then I don't experience problems afterwards.
Same issue, 5700 XT both on torch 1.13.1 and 2.0. Oddly enough, I just borrowed this card today from a friend and managed to get a single gen in before this bug occured EDIT: It started generating the entire prompt in a couple seconds, after waiting for 2 minutes. After that incident, my system became really sluggish. Prompts were generating again, but the speed was inconsistent
Is this Windows or Linux?
For me it was cut and dry, torch 2.0 doesn't work, torch 1.13.1 does. Perhaps check versions, etc? I always have a one minute delay before generations begin each time, but that's been like that since the beginning, and after it's done what it needs to do then I don't experience problems afterwards.
I'm on Ubuntu 22.04. And yes it occurs with both versions of torch. Prompt loads for a minute or two, first 90% of the gen gets done in a couple seconds, gets stuck at 97% again for a while, and then finished the prompt. Also my system seems to get really unstable after prompting, as if it's about to crash or blackscreen. Quite odd.
EDIT: Tested again, now it only occurs on torch 2.0. Works alright on 1.13.1 besides the initial lag.
I made a PR to force pytorch 1.13.1 for RX 5000 cards. also checks for python <= 3.10 Not a definitive fix, but maybe it can help other users
https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/11048
But still, why is only RX 5000 series soooo incompatible with torch 2.0??
But still, why is only RX 5000 series soooo incompatible with torch 2.0??
That's a good question. My first guess is that we need to force HSA_OVERRIDE_GFX_VERSION to make it work, but that's also trie for RX 6000, wich is working just fine.
Sooo.... Who knows.
We can't even be really sure it's just RX 5000, maybe there are other series wich have problems but no one has reported it yet
HSA_OVERRIDE_GFX_VERSION is already forced though in the script for those cards - it was set correctly for me even when things weren't working. Perhaps Torch v2.0 needs a further workaround or something.
I just hope code doesn't slip into the repo that's only torch 2.0 compatible, then we're in trouble.
But still, why is only RX 5000 series soooo incompatible with torch 2.0??
That's a good question. My first guess is that we need to force HSA_OVERRIDE_GFX_VERSION to make it work, but that's also trie for RX 6000, wich is working just fine.
Sooo.... Who knows.
HSA_OVERRIDE_GFX_VERSION is already enabled by default in webui.sh since a couple releases I think,
We can't even be really sure it's just RX 5000, maybe there are other series wich have problems but no one has reported it yet
Before this card, I ran SD on a RX 580 4GB which was a nightmare to get running. It didn't have this specific issue, but plenty of others problems that all boiled down to ROCm support.
HSA_OVERRIDE_GFX_VERSION is already forced though in the script for those cards - it was set correctly for me even when things weren't working. Perhaps Torch v2.0 needs a further workaround or something.
Yes, exactly. What i meant was that my first guess was about the HSA_OVERRIDE_GFX_VERSION causing problems, but that can't be because also the 6000 series uses that without issues.
Just out of curiosity, would there be any significant increase in performance on torch 2.0? Would be interesting to see someone on torch 2.0 with a 5700XT upload a benchmark, to compare to 1.13.1
Just out of curiosity, would there be any significant increase in performance on torch 2.0? Would be interesting to see someone on torch 2.0 with a 5700XT upload a benchmark, to compare to 1.13.1
It surely would, if we can manage to run it. Specially using --opt-sdp-attention
On AMD we can't use xformers, and that option would surely be a huge boost
Related reports: https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/9951#discussioncomment-5768112 https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/9951#discussioncomment-5964248
As far as I can tell, ROCM does not support RDNA1/Navi1.x cards.
Really? So, was it wrong of me to buy an RX 5000 GPU? And should I sell it right now??
Related reports: #9951 (comment) #9951 (comment)
As far as I can tell, ROCM does not support RDNA1/Navi1.x cards.
Really? So, was it wrong of me to buy an RX 5000 GPU? And should I sell it right now??
I believe it doesn't officially, but with the special override define it allows it to work. I'm using RocM 5.2 on a Navi1.x card.
As far as I can tell, ROCM does not Really? So, was it wrong of me to buy an RX 5000 GPU? And should I sell it right now??
No, it can still work with an older PyTorch and that override.
And technically ROCm doesn't officially supports any consumer-grade video card. Even if they work just fine with it.
The PR #11048 was merged into dev and release_candidate branches. But...is there really no way to work around the issue other than fixing torch to 1.13.1? Could it be that since RDNA1 is not officially supported by ROCm, torch 2.0 was developed without any consideration of RDNA1??
Tried it with the new rocm5.5 torch release build in the pytorch nightly repo. The same problem is still present ...
Can confirm that I have this issue too with my RX 5700 XT. Starting to regret ever buying that GPU, tbh..
Everything worked fine last time I was into using SD, sometime last year or so.
I still have this issue with RX 5700 XT. Downgrade to 1.13.1 worked for me, although there is this delay at the beginning of picture creation. I cannot use the sd-xl-base checkpoint with it though... please @AUTOMATIC1111 fix this...
I still have this issue with RX 5700 XT. Downgrade to 1.13.1 worked for me, although there is this delay at the beginning of picture creation. I cannot use the sd-xl-base checkpoint with it though... please @AUTOMATIC1111 fix this...
That probably isn't something related to the Web UI, it's an issue in pytorch itself. Or maybe in ROCm. I'm starting to think the problem here is rocm, because i had issues also in llama.cpp, both with clblast and with a forks wich aims to add rocm support.
Anyway, i found this on pytorch's github, probably related https://github.com/pytorch/pytorch/issues/106728
Anyway, i found this on pytorch's github, probably related pytorch/pytorch#106728
Indeed related, torch>=2.0.0 won't run on RDNA1 for now, even with torch wheel targeting gfx1010
which is my card in this case.
edit: wow, this is worthless!
I found some time ago an old pytorch 2.0 build wich runs on RX5000 https://github.com/pytorch/pytorch/issues/106728#issuecomment-1749511711
Is there an existing issue for this?
What happened?
I have newly installed v1.3.0, but image generation won't start even after many minutes of pressing "Generate" button.
Steps to reproduce the problem
What should have happened?
Image generation should have started.
Commit where the problem happens
20ae71faa8ef035c31aa3a410b707d792c8203a3
What Python version are you running on ?
Python 3.10.x
What platforms do you use to access the UI ?
Linux
What device are you running WebUI on?
AMD GPUs (RX 5000 below)
What browsers do you use to access the UI ?
Mozilla Firefox
Command Line Arguments
List of extensions
(None)
Console logs
Additional information
My environment: