[Bug]: 1.6.0 Hires. fix uses all memory on an AMD 7900 XT

Cykyrios commented 1 year ago

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits

What happened?

I saw some reports about NVidia issues in v1.6.0, but none about AMD so far. When generating images using hires. fix, 99% (or more) VRAM is used and this can lead to either an out of memory error or my entire PC crashing.

I'm on Manjaro Linux with kernel 6.4.12, running v1.6.0 with python 3.11.3, torch 2.1.0 + rocm5.5. I also tried a fresh venv (this downloads rocm 5.6) but this issue still also happens then. Below are some screenshots showing GPU usage while generating 512x768 images with x1.5 hires fix:

v1.5.2 sd_mem_1 5 2_3

v1.6.0 sd_mem_1 6 0_3

Lower memory usage corresponds to the first pass, higher memory usage to the hires fix.

Here is the error message when I get the out of memory error:

OutOfMemoryError: HIP out of memory. Tried to allocate 5.70 GiB. GPU 0 has a total capacty of 19.98 GiB of which 5.66 GiB is free. Of the allocated memory 8.05 GiB is allocated by PyTorch, and 5.92 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_HIP_ALLOC_CONF

Steps to reproduce the problem

Generate images using hires fix (batch count > 1, I generally do 8, makes the error more likely to appear). The above memory usage screenshots were obtained while generating 512x768 images upscaled x1.5 in the hires pass. I have previously (v1.5.2 and before) generated 960x540 images upscaled x2 with no issue at all using the same GPU and rocm version. I used no extension during generation.

What should have happened?

A 20 GB VRAM GPU should not run out of memory generating 512x768 upscaled x1.5, as seen in the v1.5.2 screenshots I get a maximum of 75% (about 15 GB).

Sysinfo

sysinfo-2023-09-02.txt

What browsers do you use to access the UI ?

Mozilla Firefox, Brave

Console logs

https://pastebin.com/K1vytEaw

Additional information

As I have an AMD Ryzen 9 7900X CPU, I added export ROCR_VISIBLE_DEVICES=0 to webui-user.sh to only use my GPU.

0-vortex commented 1 year ago

I have the same issue on apple M1 max with 32gb, hi-res destroys everything :<

Cykyrios commented 11 months ago

Is there any news about this issue? Are other users impacted? Since 1.6.0 released over 2 months ago, I still have this issue (I just tried creating a new venv to check, downloaded the latest pytorch 2.2.0 and rocm 5.6). Any chance to find memory management issues somewhere in the code base that could explain this? I might have a try at bisecting the issue over the weekend if I have time, to see exactly what caused the problem, but I expect this to take quite a bit of time.

Cykyrios commented 11 months ago

I ended up not bisecting, as any non-release commit crashes immediately when I try to run the webui. However, and I don't want to jinx it, but I think I found a nice workaround in the comments of #6460, and added the following line to webui-user.sh: export PYTORCH_HIP_ALLOC_CONF="garbage_collection_threshold:0.7,max_split_size_mb:512" I played a bit with the 0.7 value, started with 0.9 but still got memory issues then, with 0.7 I haven't had any problem yet.

NeoChen1024 commented 11 months ago

The problem goes away if InvokeAI optimization is used instead of Doggettx

nonetrix commented 10 months ago

The problem goes away if InvokeAI optimization is used instead of Doggettx

How do you do this exactly?

Cykyrios commented 10 months ago

You can change the optimization method in Settings -> Optimizations (in the Stable Diffusion category, from version 1.7+). I did end up switching to InvokeAI, though I kept the above command as well, not a single problem since then (with Doggettx, it would still crash/freeze on pretty rare occurrences).

Not sure if the issue should be closed with this workaround, since it still appears to be a regression for the Doggettx optimization.

KirillKocheshkov commented 8 months ago

Changing optimization method didnt help for me

nonetrix commented 8 months ago

Still uses all my VRAM for me on SDXL, I really want to just use Automatic1111. I am not a fan of ComfyUI it's too much work imo, but it doesn't have this issue at all. What is the core of the issue? VAE decode seems to be the worst and where it usually crashes

Oh yeah it also likes to make my PC randomly reset even though my system is otherwise completely stable even running LLMs :D

hqnicolas commented 5 months ago

@Cykyrios You can use my Repo to install Stable diffusion on ROCm RX 7900 XT it solves AMD ROCm RDNA2 & 3 problems with docker containers on linux https://github.com/hqnicolas/StableDiffusionROCm it was stable at 1.9.3 (latest) if you like this automation REPO please let a Star on it ⭐

nonetrix commented 5 months ago

@Cykyrios You can use my Repo to install Stable diffusion on ROCm RX 7900 XT it solves AMD ROCm RDNA2 & 3 problems with docker containers on linux https://github.com/hqnicolas/StableDiffusionROCm it was stable at 1.9.3 (latest) if you like this automation REPO please let a Star on it ⭐

Hey, out of curiosity does this use ROCm 6.1? There is some fixes that might have came to my GPU in that release, been meaning to test them out on Ubuntu manually updated but didn't get the chance

hqnicolas commented 5 months ago

Hey, out of curiosity does this use ROCm 6.1? There is some fixes that might have came to my GPU in that release, been meaning to test them out on Ubuntu manually updated but didn't get the chance

@Beinsezii wanna make this repo an AMD page for stable diffusion I think this rocm fixes will make difference too

You will need to change the driver installation bash script from 6.0 to 6.1 And I think you will need to change the pytorch from rocm 5.6 to 6.1 too

nonetrix commented 5 months ago

Hey, out of curiosity does this use ROCm 6.1? There is some fixes that might have came to my GPU in that release, been meaning to test them out on Ubuntu manually updated but didn't get the chance

@Beinsezii wanna make this repo an AMD page for stable diffusion I think this rocm fixes will make difference too

You will need to change the driver installation bash script from 6.0 to 6.1 And I think you will need to change the pytorch from rocm 5.6 to 6.1 too

This line I would assume? ENV TORCH_COMMAND="pip install torch==2.1.2+rocm5.6 torchvision==0.16.2+rocm5.6 --extra-index-url https://download.pytorch.org/whl/rocm5.6"

hqnicolas commented 5 months ago

This line I would assume? ENV TORCH_COMMAND="pip install torch==2.1.2+rocm5.6 torchvision==0.16.2+rocm5.6 --extra-index-url https://download.pytorch.org/whl/rocm5.6"

Yes, you will need to test the compatibility with the installation image Because it's 3 layer of ROCm 1 - Baremetal driver ROCm 6.0 2 - Docker driver already ROCm 6.1 3 - Python torch plugin ROCm 5.6

I think you can change the Baremetal driver first and measure the results

AUTOMATIC1111 / stable-diffusion-webui