AUTOMATIC1111 / stable-diffusion-webui

Stable Diffusion web UI
GNU Affero General Public License v3.0
142k stars 26.83k forks source link

[Bug]: "Memory access fault by GPU node-1" error with RX 6600 on Linux #8139

Open Yumae opened 1 year ago

Yumae commented 1 year ago

Is there an existing issue for this?

What happened?

When trying to generate pictures above a certain resolution i get this error in the console window. I have been able to consistently reproduce this by trying to generate a picture bigger than 768x1024/1024x768. Im sure that i could go higher than that with the amount of VRAM that this card has considering that the KDE resource monitor shows VRAM usage never reaching 7gb. In the screenshot it can be seen that the generation process goes to 100% but when it tries to output the image it spits out that error instead. Screenshot_20230226_130106

Steps to reproduce the problem

Generate a picture with a resolution higher than 1024x768 like for example 1280x768.

What should have happened?

It should output the picture and it should let me generate at higher resolutions as well.

Commit where the problem happens

3715ece0

What platforms do you use to access the UI ?

Linux

What browsers do you use to access the UI ?

Mozilla Firefox

Command Line Arguments

export COMMANDLINE_ARGS="--medvram --listen"

List of extensions

wildcards openpose-editor stable-diffusion-webui-dataset-tag-editor stable-diffusion-webui-images-browser stable-diffusion-webui-pixelization

Console logs

################################################################
Install script for stable-diffusion + Web UI
Tested on Debian 11 (Bullseye)
################################################################

################################################################
Running on anon user
################################################################

################################################################
Repo already cloned, using it as install directory
################################################################

################################################################
Create and activate python venv
################################################################

################################################################
Launching launch.py...
################################################################
Python 3.10.9 (main, Dec 19 2022, 17:35:49) [GCC 12.2.0]
Commit hash: 3715ece0adce7bf7c5e9c5ab3710b2fdc3848f39
Installing requirements for Web UI

Launching Web UI with arguments: --medvram --listen
No module 'xformers'. Proceeding without it.
Loading weights [c353313f5d] from /home/anon/stable-diffusion-webui/models/Stable-diffusion/AOM2-nutmegmixGav2+ElysV2.safetensors
Creating model from config: /home/anon/stable-diffusion-webui/configs/v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Loading VAE weights specified in settings: /home/anon/stable-diffusion-webui/models/VAE/sd15.vae.pt
Applying cross attention optimization (Doggettx).
Textual inversion embeddings loaded(25): chr-atagorq, chr-ayanami, chr-bremertonsummer, chr-honolulu, chr-shun, chr-shylily, chr-sirius, chr-stlouislux, chr-taihou, chr-yamashiro, chr-yukikazepan, ero-lactation, ero-doggystyle, ero-deepmissionary, spe-centaur, chr-nahida, spe-mothgirl, chr-okayu, spe-lamia, chr-senko, chr-i19, chr-lumine, chr-kashino, chr-yuudachi, EasyNegative
Model loaded in 18.8s (load weights from disk: 7.2s, create model: 1.1s, apply weights to model: 8.2s, apply half(): 0.6s, load VAE: 1.6s).
[tag-editor] Settings has been read from config.json
Running on local URL:  http://0.0.0.0:7860

To create a public link, set `share=True` in `launch()`.
  0%|                                                                                  | 0/20 [00:00<?, ?it/s]MIOpen(HIP): Warning [SQLiteBase] Missing system database file: gfx1030_14.kdb Performance may degrade. Please follow instructions to install: https://github.com/ROCmSoftwarePlatform/MIOpen#installing-miopen-kernels-package
100%|█████████████████████████████████████████████████████████████████████████| 20/20 [00:07<00:00,  2.66it/s]
100%|█████████████████████████████████████████████████████████████████████████| 20/20 [00:44<00:00,  2.23s/it]
Memory access fault by GPU node-1 (Agent handle: 0x559e97c5f090) on address 0x7f91800e1000. Reason: Page not present or supervisor privilege.

Warning: Program '/home/anon/stable-diffusion-webui/webui.sh' crashed.

Additional information

Distro: EndeavourOS (ArchLinux) DE: KDE on X11 CPU: Ryzen 1600 GPU: RX 6600 (8GB VRAM) RAM: 16GB

WebUI installed with the default script. I didn't mess with ROCm versions or any of that since it took care of that automatically. Can generate pictures at or below 1024x768 with no problems. I get the same error both with and without highres fix enabled. Screenshot_20230226_132136

raff766 commented 1 year ago

Can confirm, I'm having the same exact issue with my RX 6800 XT (16GB VRAM)

Parzival1608vonKatze commented 1 year ago

Same here, exactly the same issue. (RX 6700XT 12GB)

ishawn944 commented 1 year ago

You can try the following commands: sudo usermod -a -G video $USER sudo usermod -a -G render $USER

Set the environment variable for SD: PYTORCH_HIP_ALLOC_CONF=garbage_collection_threshold:0.6,max_split_size_mb:128 GPU memory will be garbage collected when it reaches 60% capacity. and set the maximum size of memory splits to 128mb, that can help to reduce memory fragmentation.

You may also need to add the --medvram These worked for my RX 6750XT

Ridien commented 1 year ago

Running the WebUI using --no-half and --lowvram solved it for me.

popemkt commented 1 year ago

Can confirm the same with RX 6800S 8GB

mlrey7 commented 1 year ago

Upgrading to pytorch 2.0 and rocm 5.4.2 fixed this for me. Also using --opt-sub-quad-attention really helps along side with --medvram and the PYTORCH_HIP_ALLOC_CONF=garbage_collection_threshold:0.6,max_split_size_mb:128 All of these allows me to hi-res fix 512x768 to 1.85x (944x1420) on the RX 6600 8GB VRAM

DGdev91 commented 1 year ago

I have the same problem on 5700XT using rocm 5.4.2 and pytorch 2.0 Strangely, it works fine using pytorch 1.13.1

same issue with both --medvram and --lowvram

With pytorch 2 i also tried to use --opt-sdp-attention with no effect

i also use --precision full and --no-half

i finally tried export PYTORCH_HIP_ALLOC_CONF=garbage_collection_threshold:0.6,max_split_size_mb:128, did not help.

egolfbr commented 1 year ago

I have the same problem on 5700XT using rocm 5.4.2 and pytorch 2.0 Strangely, it works fine using pytorch 1.13.1

same issue with both --medvram and --lowvram

With pytorch 2 i also tried to use --opt-sdp-attention with no effect

i also use --precision full and --no-half

i finally tried export PYTORCH_HIP_ALLOC_CONF=garbage_collection_threshold:0.6,max_split_size_mb:128, did not help.

I have a very similar setup on an Ubuntu machine. I downgraded to Pytorch 1.13.1 and everything appears to be fine except for a warning about missing database file

MIOpen(HIP): Warning [SQLiteBase] Missing system database file: gfx1030_20.kdb Performance may degrade. Please follow instructions to install: https://github.com/ROCmSoftwarePlatform/MIOpen#installing-miopen-kernels-package
DianaNites commented 1 year ago

@egolfbr That a harmless warning from AMD due to https://github.com/ROCmSoftwarePlatform/MIOpen/blob/develop/doc/src/cache.md

TLDR is that AMD ROCm will compile and cache some GPU stuff in the background, but also comes with pre-compiled GPU kernels for some cards. The version with pytorch 1.x does not seem to bundle a copy for your card, but the only effect should be that the first image you generate may be slow.

skerit commented 1 year ago

Had the same error while using a Lora model, but I was still using torch 1.12 Upgrading to 1.13.1 fixed it for me.

torgeir commented 1 year ago

The following fixed an issue similar to OP

index 49a426ff..03b57253 100644
--- a/webui-user.sh
+++ b/webui-user.sh
@@ -10,7 +10,7 @@
 #clone_dir="stable-diffusion-webui"

 # Commandline arguments for webui.py, for example: export COMMANDLINE_ARGS="--medvram --opt-split-attention"
-#export COMMANDLINE_ARGS=""
+#export COMMANDLINE_ARGS="--reinstall-torch"

 # python3 executable
 #python_cmd="python3"
@@ -27,6 +27,9 @@
 # install command for torch
 #export TORCH_COMMAND="pip install torch==1.12.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113"

+# https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/8139
+export PYTORCH_HIP_ALLOC_CONF=garbage_collection_threshold:0.8,max_split_size_mb:128
+
 # Requirements file to use for stable-diffusion-webui
 #export REQS_FILE="requirements_versions.txt"
--- a/webui.sh
+++ b/webui.sh
@@ -119,7 +119,7 @@ esac
 if echo "$gpu_info" | grep -q "AMD" && [[ -z "${TORCH_COMMAND}" ]]
 then
     # AMD users will still use torch 1.13 because 2.0 does not seem to work.
-    export TORCH_COMMAND="pip install torch==1.13.1+rocm5.2 torchvision==0.14.1+rocm5.2 --index-url https://download.pytorch.org/whl/rocm5.2"
+    export TORCH_COMMAND="pip install torch torchvision --index-url https://download.pytorch.org/whl/rocm5.4.2"
 fi  

 for preq in "${GIT}" "${python_cmd}"

Arch, RX6800XT

Essoje commented 1 year ago

Confirming the above has solved the 'Memory access fault by GPU node-1' problem on my machine. However, while the above would work without a problem on a clean installation, I was forced to additionally use the --ignore-installed flag on the pip install command as follows.

TORCH_COMMAND="pip install --ignore-installed torch torchvision --index-url https://download.pytorch.org/whl/rocm5.4.2"

Manjaro, RX6900XT

lufixSch commented 11 months ago

Just wanted to add, for anyone finding this.

sudo usermod -a -G video $USER sudo usermod -a -G render $USER

For some reason I got this error after adding my user to the groups video and render. When removing the groups everything worked again.

sudo usermod -r -G video $USER
sudo usermod -r -G render $USER
juipeltje commented 9 months ago

i'm having the same problem with fooocus running on void linux with a 6950xt, pretty much tried every solution in this thread to no avail, but what seems to work as a workaround for me now is to use it with --always-no-vram and --always-offload-from-vram, not sure if A1111 has similar flags available but maybe worth a shot. it is a little bit slower compared to using vram, but it still easily beats running on cpu and atleast now i can leave it running to generate a bunch of images without it crashing every other image. if you have the extra system ram available it might be a good bandaid solution.