AUTOMATIC1111 / stable-diffusion-webui

Stable Diffusion web UI
GNU Affero General Public License v3.0
135.75k stars 25.91k forks source link

[Bug]: Major VRAM management issues #7451

Open Myridium opened 1 year ago

Myridium commented 1 year ago

Is there an existing issue for this?

What happened?

Frequent CUDA out-of-memory errors.

These have been reported several times by other users, however other reports have blamed specific features like the hi-res fix or image preview generation or some other commit. However I think there is something decidedly wrong with the memory management of stable-diffusion-webui altogether. I think the other complaints are just manifestations of this poor memory management.

I have been getting inconsistent out-of-memory errors from CUDA despite having demonstrably more than double the amount of required VRAM to complete the task of upscaling (I have successfully upscaled two images simultaneously a few times. But it rarely works.)

I have been using this program from a frame buffer in Linux to ensure that nothing else is touching the GPU. Indeed, nvidia-smi confirms that the GPU is actually even powered off before stable-diffusion-webui is launched. I repeat: nothing is touching the GPU except for this program, and this program produces CUDA out-of-memory errors on an inconsistent basis, in situations where I have previously completed a task requiring double the amount of required VRAM..

I have received out-of-memory errors where there was insufficient VRAM available, but it could not be allocated (implies fragmentation). And I have received out-of-memory errors where there was insufficient VRAM available.

It seems clear that there is a VRAM memory leak in this program.

Steps to reproduce the problem

Do anything memory-intensive. Upscale multiple images in batches. Generate live previews while upscaling a single image. That kind of thing. Live previews appear to make the issue significantly worse.

What should have happened?

If I am able to upscale two images simultaneously, then this demonstrates that there is sufficient VRAM for the task. This should be repeatable, without out-of-memory errors.

The program should properly allocate and release VRAM, and it should explicit defragment the memory if necessary.

Commit where the problem happens

I've tried a few, dating back to about 2023-01-20.

What platforms do you use to access the UI ?

No response

What browsers do you use to access the UI ?

No response

Command Line Arguments

I've tried many combinations. There is a memory leak on any one of them.
Of course, using options which reduce memory consumption makes the problems less severe. But the problem is still there.
In particular, `--medvram` helps a great deal.

List of extensions

None in use.

Console logs

I can't run it right now to get a console log. It's the same error message you've seen in tons of other bug reports. CUDA out of memory. Sometimes the free memory is slightly greater than the requested amount (implying that fragmentation caused the failure), and sometimes there is just not enough free memory.

Additional information

Please do unittests and get to the bottom of the memory leak.

ataa commented 1 year ago

What sampler do you use? I reported GPU memory leak (when using DPM++ SDE Karras) While ago and also cross-posted it to K-Diffusion repo. (The bug report disappeared after a few hours!)

6678

gsgoldma commented 1 year ago

What sampler do you use? I reported GPU memory leak (when using DPM++ SDE Karras) While ago and also cross-posted it to K-Diffusion repo. (bug report disappeared after few hours!) #6678

I use euler A and experience the memory issue.

Myridium commented 1 year ago

What sampler do you use? I reported GPU memory leak (when using DPM++ SDE Karras) While ago and also cross-posted it to K-Diffusion repo. (The bug report disappeared after a few hours!)

6678

Usually DPM++ 2M Karras.

hopto-dot commented 1 year ago

I can confirm, there has been some recent commit that no longer allows me to perform an img2img with as high dimensions as before (2-3 days ago).

As a test I used the following parameters for img2img: Sampling method: Euler a Batch count and Batch size: 1 CFG Scale: 7 Dimensions: 1296 x 2064 Denoising strength: 0.25 Steps: 20 Script: None

I did this on an RTX 3090 (24gb) with a clean installation and I get the following errors:

venv "E:\git\stable-diffusion-webui\venv\Scripts\Python.exe"
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Commit hash: 226d840e84c5f306350b0681945989b86760e616
Installing requirements for Web UI
Launching Web UI with arguments:
No module 'xformers'. Proceeding without it.
Loading weights [6b4e752dba] from E:\git\stable-diffusion-webui\models\Stable-diffusion\ModelNameV6-ep120-gs06481.ckpt
Creating model from config: E:\git\stable-diffusion-webui\models\Stable-diffusion\ModelNameV6-ep120-gs06481.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Applying cross attention optimization (Doggettx).
Textual inversion embeddings loaded(0):
Model loaded in 8.4s (load weights from disk: 1.2s, create model: 1.3s, apply weights to model: 0.9s, apply half(): 2.3s, move model to device: 1.4s, load textual inversion embeddings: 1.3s).
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
  0%|                                                                                            | 0/6 [00:08<?, ?it/s]
Error completing request
Arguments: ('task(f36bc6pnz3g0q4u)', 0, 'placeholder prompt', 'placeholder negative prompt', [], <PIL.Image.Image image mode=RGBA size=1296x2064 at 0x287C8CF9C60>, None, None, None, None, None, None, 20, 0, 4, 0, 1, False, False, 1, 1, 7, 0.25, -1.0, -1.0, 0, 0, 0, False, 2064, 1296, 0, 0, 32, 0, '', '', '', [], 0, '<ul>\n<li><code>CFG Scale</code> should be 2 or lower.</li>\n</ul>\n', True, True, '', '', True, 50, True, 1, 0, False, 4, 1, '<p style="margin-bottom:0.75em">Recommended settings: Sampling Steps: 80-100, Sampler: Euler a, Denoising strength: 0.8</p>', 128, 8, ['left', 'right', 'up', 'down'], 1, 0.05, 128, 4, 0, ['left', 'right', 'up', 'down'], False, False, 'positive', 'comma', False, False, '', '<p style="margin-bottom:0.75em">Will upscale the image by the selected scale factor; use width and height sliders to set tile size</p>', 64, 0, 2, 1, '', 0, '', 0, '', True, False, False, False) {}
Traceback (most recent call last):
  File "E:\git\stable-diffusion-webui\modules\call_queue.py", line 56, in f
    res = list(func(*args, **kwargs))
  File "E:\git\stable-diffusion-webui\modules\call_queue.py", line 37, in f
    res = func(*args, **kwargs)
  File "E:\git\stable-diffusion-webui\modules\img2img.py", line 168, in img2img
    processed = process_images(p)
  File "E:\git\stable-diffusion-webui\modules\processing.py", line 484, in process_images
    res = process_images_inner(p)
  File "E:\git\stable-diffusion-webui\modules\processing.py", line 626, in process_images_inner
    samples_ddim = p.sample(conditioning=c, unconditional_conditioning=uc, seeds=seeds, subseeds=subseeds, subseed_strength=p.subseed_strength, prompts=prompts)
  File "E:\git\stable-diffusion-webui\modules\processing.py", line 1041, in sample
    samples = self.sampler.sample_img2img(self, self.init_latent, x, conditioning, unconditional_conditioning, image_conditioning=self.image_conditioning)
  File "E:\git\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 264, in sample_img2img
    samples = self.launch_sampling(t_enc + 1, lambda: self.func(self.model_wrap_cfg, xi, extra_args={
  File "E:\git\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 190, in launch_sampling
    return func()
  File "E:\git\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 264, in <lambda>
    samples = self.launch_sampling(t_enc + 1, lambda: self.func(self.model_wrap_cfg, xi, extra_args={
  File "E:\git\stable-diffusion-webui\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "E:\git\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\sampling.py", line 145, in sample_euler_ancestral
    denoised = model(x, sigmas[i] * s_in, **extra_args)
  File "E:\git\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "E:\git\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 107, in forward
    x_out[a:b] = self.inner_model(x_in[a:b], sigma_in[a:b], cond={"c_crossattn": [tensor[a:b]], "c_concat": [image_cond_in[a:b]]})
  File "E:\git\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "E:\git\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\external.py", line 112, in forward
    eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), **kwargs)
  File "E:\git\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\external.py", line 138, in get_eps
    return self.inner_model.apply_model(*args, **kwargs)
  File "E:\git\stable-diffusion-webui\modules\sd_hijack_utils.py", line 17, in <lambda>
    setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs))
  File "E:\git\stable-diffusion-webui\modules\sd_hijack_utils.py", line 28, in __call__
    return self.__orig_func(*args, **kwargs)
  File "E:\git\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 858, in apply_model
    x_recon = self.model(x_noisy, t, **cond)
  File "E:\git\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "E:\git\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 1329, in forward
    out = self.diffusion_model(x, t, context=cc)
  File "E:\git\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "E:\git\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\openaimodel.py", line 776, in forward
    h = module(h, emb, context)
  File "E:\git\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "E:\git\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\openaimodel.py", line 84, in forward
    x = layer(x, context)
  File "E:\git\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "E:\git\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\attention.py", line 324, in forward
    x = block(x, context=context[i])
  File "E:\git\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "E:\git\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\attention.py", line 259, in forward
    return checkpoint(self._forward, (x, context), self.parameters(), self.checkpoint)
  File "E:\git\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\util.py", line 114, in checkpoint
    return CheckpointFunction.apply(func, len(inputs), *args)
  File "E:\git\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\util.py", line 129, in forward
    output_tensors = ctx.run_function(*ctx.input_tensors)
  File "E:\git\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\attention.py", line 262, in _forward
    x = self.attn1(self.norm1(x), context=context if self.disable_self_attn else None) + x
  File "E:\git\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "E:\git\stable-diffusion-webui\modules\sd_hijack_optimizations.py", line 127, in split_cross_attention_forward
    s1 = einsum('b i d, b j d -> b i j', q[:, i:end], k)
  File "E:\git\stable-diffusion-webui\venv\lib\site-packages\torch\functional.py", line 378, in einsum
    return _VF.einsum(equation, operands)  # type: ignore[attr-defined]
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 26.03 GiB (GPU 0; 24.00 GiB total capacity; 2.25 GiB already allocated; 19.05 GiB free; 2.71 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

GPU has 4476MiB of its VRAM allocated after I do this - I'm not sure if that is higher than usual?

A few other things:

Myridium commented 1 year ago

@hopto-dot thanks for sharing your console log. With 24GB VRAM this is quite extreme.

thevoidwatches commented 1 year ago

I'm having the same error. While I don't have nearly as much VRAM as you (only 8gb) I was still able to run generations at 800x450 or 480x600 and upscale them at x2. Now I can only upscale the 480x600 at x1.5 and I can't upscale the 800x450 at all, getting the same memory error when I do.

NilPtrDeref commented 1 year ago

I'm having the same problem. Mine is trying to allocate over 10Gb for 2x on a 512x768.

ireallyamaperson commented 1 year ago

I am experiencing full use of 24gb of ram on a 4090 and the image failing to complete when using 'restore faces' seemingly randomly. After many tests it seems to relate specifically to subject matter and how close a face resembles a real person to begin with.

Easy to discern faces such as regular men and women complete fine, however full usage (to the point of it hanging completely) seem to occur on subjects such as demons, goblins and some anthropomorphic animals.

Awethon commented 1 year ago

As a temporary fix I did git checkout to cc8c9b7474d917888a0bd069fcd59a458c67ae4b (last commit on Jan 27) Highres fix works well. So I guess breaking commit is somewhere in Jan 28 - Feb 1 range.

And this issue is not OS/GPU/Sampler specific, I experienced it with Ubuntu/A4000/DDIM.

thevoidwatches commented 1 year ago

I can confirm that rolling back to cc8c9b7 has solved the issue for the moment.

someuser22 commented 1 year ago

I'm sorry if this comment will be approximate but I don't have git knowledge and don't have a comparison to make since I started using the web ui a few days ago. I have a GTX 1650 mobile with 4 GB VRAM, Windows 11. I am generating at 832x704 with DPM++ SDE Karras, using --xformers and --medvram flags. If I don't use --medvram the first generation goes fine, but I get a memory allocation error right at the end of the second generation. Restarting the web ui fixes the problem for a generation. I was wondering if this was intended behaviour or not, but it's not easy to find information about low VRAM cards on the internet. I typed git reset --hard cc8c9b7 in the command line but I'm not sure it's enough to get that commit.

Myridium commented 1 year ago

I'm sorry if this comment will be approximate but I don't have git knowledge and don't have a comparison to make since I started using the web ui a few days ago. I have a GTX 1650 mobile with 4 GB VRAM, Windows 11. I am generating at 832x704 with DPM++ SDE Karras, using --xformers and --medvram flags. If I don't use --medvram the first generation goes fine, but I get a memory allocation error right at the end of the second generation. Restarting the web ui fixes the problem for a generation. I was wondering if this was intended behaviour or not, but it's not easy to find information about low VRAM cards on the internet. I typed git reset --hard cc8c9b7 in the command line but I'm not sure it's enough to get that commit.

It's not normal no, it's probably the same bug you're experiencing.

Running git checkout <hash> will get the right commit. I think your command git reset --hard <hash> also does the trick.

Tempaccnt commented 1 year ago

I have noticed that if you keep using the same model, you're less likely to run into this issue. it feels like every time you change a model a chunk of your VRAM goes with it until you are out of memory and need to restart it.

Myridium commented 1 year ago

I have noticed that if you keep using the same model, you're less likely to run into this issue. it feels like every time you change a model a chunk of your VRAM goes with it until you are out of memory and need to restart it.

Yep. Similar things happen if you start with generating image previews, then turn them off. All signs of memory leaks.

notdelicate commented 1 year ago

I'm also experiencing issues with memory managment. I can only generate 4 images and then my desktop crashes and I have to hit Ctrl + Alt + Backspace to close the session. I used to be able to generate image after image before, I don't know why is this happening.

Zueuk commented 1 year ago

I was still able to run generations at 800x450 or 480x600 and upscale them at x2. Now I can only upscale the 480x600 at x1.5 and I can't upscale the 800x450 at all, getting the same memory error when I do.

I've only 4 Gb but just a couple months ago with --medvram I could render 1600x1024 and slightly above (but somehow not 1280x1280), now sometimes getting Out of memory even on 1024x768

What I find the most absurd is when the Out of memory error happens after all the low-res and then hi-res steps are completed - it's like something is trying to use VRAM for some trivial task that could/should have been done in the normal RAM...

rafx85 commented 1 year ago

As a temporary fix I did git checkout to cc8c9b7 (last commit on Jan 27) Highres fix works well. So I guess breaking commit is somewhere in Jan 28 - Feb 1 range.

And this issue is not OS/GPU/Sampler specific, I experienced it with Ubuntu/A4000/DDIM.

For me did not help, from yesterday I got this issue, also, when SD loads it takes 3 GB VRAM at the start, yesterday it took only 0.2 GB VRAM, so I don't know what have changed from yesterday, I did not do any changes to files.

Tempaccnt commented 1 year ago

I got it fixed. the problem in my case was that my version of pytorch was 1.12 while the latest updates of the webui are designed to work at 1.13.

to fix the issue, just edit webui-user.bat by adding --reinstall-torch --reinstall-xformers in the argument section. this will install the latest compatible versions of pytorch and xformers you could delete these arguments after everything is updated.

see if this works for you

Cleanup-Crew-From-Discord commented 1 year ago

can confirm i've had a similar issue, but while writing this i seem to have somehow came up with a temporary fix. nvtop shows that there are literally 0 other processes using the GPU. If i accidentally generate a large image, it crashes due to out of memory, but the memory it was attempting to use isn't cleared out. trying to generate another image would fail, but if i tried to generate a sufficiently tiny image (256x256 or so) it would suddenly remember it needed to clear out that vram. Strangely enough, when I disabled the "Show generation progress in window title" option after one of these failures and reloaded the UI, it still throws an out of memory error... but the progress bar in the command line keeps running (not in the webui though, it looks dead from there) and five or so seconds after it finishes, it spits the image out and it shows up in the webui as though nothing happened. I'm not confident that the actual setting change did much, but resetting the UI afterwards seemed to be the fix.

launch options were "--xformers --opt-split-attention --opt-sub-quad-attention --share", images were 512x512 upscaled 2x to 1024x1024 by hires fix, with a 2060 6gb.

Tempaccnt commented 1 year ago

what is your pytorch version? if it's 1.12 update it to 1.13(as a bonus update xformers too) and see if it's fixed. this solved all my VRAM issues that started appearing with the new updates of the webui

Cleanup-Crew-From-Discord commented 1 year ago

torch says it's 1.13.1, and xformers was freshly installed. I'm running on an utterly godawful temporary setup right now where half my ram is actually HDD swap (its a long story), so restarting takes ~15 minutes. image here's the versions I have installed, just to make 100% sure i'm reading them correctly

Tempaccnt commented 1 year ago

torch says it's 1.13.1, and xformers was freshly installed. I'm running on an utterly godawful temporary setup right now where half my ram is actually HDD swap (its a long story), so restarting takes ~15 minutes. image here's the versions I have installed, just to make 100% sure i'm reading them correctly

hmm, yeah it should have worked with those versions. it seems it's a different issue to mine then. the only other advice I could give now, is to disable live preview if you have it ON. hopefully someone has a solution for you.

Cleanup-Crew-From-Discord commented 1 year ago

hmm, i can't remember the last time I even updated SD itself... come to think of it, how can I upgrade my already installed version of stable diffusion again?

Tempaccnt commented 1 year ago

copy the important folder/files (models, extensions, embeddedings, style.xls, GFPGAN, output)

then delete the folder and reclone automatic1111 repository and transfer those files you copied earlier back in. this is how I usually do it.

ps: maybe consider not pasting your extensions folder when you run it for the first time. because they usually cause issues when you update the webui.so they might actually be the cause of your problems to begin with.

rafx85 commented 1 year ago

hmm, i can't remember the last time I even updated SD itself... come to think of it, how can I upgrade my already installed version of stable diffusion again?

You can make git pull in webui-user.bat and also add --medvram it helped with my taking too much VRAM problem.

git pull

@echo off

set PYTHON=
set GIT=
set VENV_DIR=
set COMMANDLINE_ARGS= --xformers --medvram --share 

call webui.bat
Cleanup-Crew-From-Discord commented 1 year ago

add --medvram it helped with my taking too much VRAM problem

Disabling image previews mostly solved my issue, but i'll try throwing this in too so I can make even larger images

2blackbar commented 1 year ago

split attention v1 fixes OOM issues https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/8394

GrgMdmn commented 12 months ago

Hi ! I have the same error with my rx 6800 GPU... I have 16GB VRAM, but I have this error :

RuntimeError: Could not allocate tensor with 7833840 bytes. There is not enough GPU video memory available! I do not understand since 7833840 bytes = 0,007GB...

Edit : fixed with set COMMANDLINE_ARGS=--opt-sub-quad-attention --medvram --disable-nan-check --autolaunch in the webui-user.bat file :)