Open Myridium opened 1 year ago
What sampler do you use? I reported GPU memory leak (when using DPM++ SDE Karras) While ago and also cross-posted it to K-Diffusion repo. (The bug report disappeared after a few hours!)
What sampler do you use? I reported GPU memory leak (when using DPM++ SDE Karras) While ago and also cross-posted it to K-Diffusion repo. (bug report disappeared after few hours!) #6678
I use euler A and experience the memory issue.
What sampler do you use? I reported GPU memory leak (when using DPM++ SDE Karras) While ago and also cross-posted it to K-Diffusion repo. (The bug report disappeared after a few hours!)
6678
Usually DPM++ 2M Karras.
I can confirm, there has been some recent commit that no longer allows me to perform an img2img with as high dimensions as before (2-3 days ago).
As a test I used the following parameters for img2img: Sampling method: Euler a Batch count and Batch size: 1 CFG Scale: 7 Dimensions: 1296 x 2064 Denoising strength: 0.25 Steps: 20 Script: None
I did this on an RTX 3090 (24gb) with a clean installation and I get the following errors:
venv "E:\git\stable-diffusion-webui\venv\Scripts\Python.exe"
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Commit hash: 226d840e84c5f306350b0681945989b86760e616
Installing requirements for Web UI
Launching Web UI with arguments:
No module 'xformers'. Proceeding without it.
Loading weights [6b4e752dba] from E:\git\stable-diffusion-webui\models\Stable-diffusion\ModelNameV6-ep120-gs06481.ckpt
Creating model from config: E:\git\stable-diffusion-webui\models\Stable-diffusion\ModelNameV6-ep120-gs06481.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Applying cross attention optimization (Doggettx).
Textual inversion embeddings loaded(0):
Model loaded in 8.4s (load weights from disk: 1.2s, create model: 1.3s, apply weights to model: 0.9s, apply half(): 2.3s, move model to device: 1.4s, load textual inversion embeddings: 1.3s).
Running on local URL: http://127.0.0.1:7860
To create a public link, set `share=True` in `launch()`.
0%| | 0/6 [00:08<?, ?it/s]
Error completing request
Arguments: ('task(f36bc6pnz3g0q4u)', 0, 'placeholder prompt', 'placeholder negative prompt', [], <PIL.Image.Image image mode=RGBA size=1296x2064 at 0x287C8CF9C60>, None, None, None, None, None, None, 20, 0, 4, 0, 1, False, False, 1, 1, 7, 0.25, -1.0, -1.0, 0, 0, 0, False, 2064, 1296, 0, 0, 32, 0, '', '', '', [], 0, '<ul>\n<li><code>CFG Scale</code> should be 2 or lower.</li>\n</ul>\n', True, True, '', '', True, 50, True, 1, 0, False, 4, 1, '<p style="margin-bottom:0.75em">Recommended settings: Sampling Steps: 80-100, Sampler: Euler a, Denoising strength: 0.8</p>', 128, 8, ['left', 'right', 'up', 'down'], 1, 0.05, 128, 4, 0, ['left', 'right', 'up', 'down'], False, False, 'positive', 'comma', False, False, '', '<p style="margin-bottom:0.75em">Will upscale the image by the selected scale factor; use width and height sliders to set tile size</p>', 64, 0, 2, 1, '', 0, '', 0, '', True, False, False, False) {}
Traceback (most recent call last):
File "E:\git\stable-diffusion-webui\modules\call_queue.py", line 56, in f
res = list(func(*args, **kwargs))
File "E:\git\stable-diffusion-webui\modules\call_queue.py", line 37, in f
res = func(*args, **kwargs)
File "E:\git\stable-diffusion-webui\modules\img2img.py", line 168, in img2img
processed = process_images(p)
File "E:\git\stable-diffusion-webui\modules\processing.py", line 484, in process_images
res = process_images_inner(p)
File "E:\git\stable-diffusion-webui\modules\processing.py", line 626, in process_images_inner
samples_ddim = p.sample(conditioning=c, unconditional_conditioning=uc, seeds=seeds, subseeds=subseeds, subseed_strength=p.subseed_strength, prompts=prompts)
File "E:\git\stable-diffusion-webui\modules\processing.py", line 1041, in sample
samples = self.sampler.sample_img2img(self, self.init_latent, x, conditioning, unconditional_conditioning, image_conditioning=self.image_conditioning)
File "E:\git\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 264, in sample_img2img
samples = self.launch_sampling(t_enc + 1, lambda: self.func(self.model_wrap_cfg, xi, extra_args={
File "E:\git\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 190, in launch_sampling
return func()
File "E:\git\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 264, in <lambda>
samples = self.launch_sampling(t_enc + 1, lambda: self.func(self.model_wrap_cfg, xi, extra_args={
File "E:\git\stable-diffusion-webui\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "E:\git\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\sampling.py", line 145, in sample_euler_ancestral
denoised = model(x, sigmas[i] * s_in, **extra_args)
File "E:\git\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "E:\git\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 107, in forward
x_out[a:b] = self.inner_model(x_in[a:b], sigma_in[a:b], cond={"c_crossattn": [tensor[a:b]], "c_concat": [image_cond_in[a:b]]})
File "E:\git\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "E:\git\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\external.py", line 112, in forward
eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), **kwargs)
File "E:\git\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\external.py", line 138, in get_eps
return self.inner_model.apply_model(*args, **kwargs)
File "E:\git\stable-diffusion-webui\modules\sd_hijack_utils.py", line 17, in <lambda>
setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs))
File "E:\git\stable-diffusion-webui\modules\sd_hijack_utils.py", line 28, in __call__
return self.__orig_func(*args, **kwargs)
File "E:\git\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 858, in apply_model
x_recon = self.model(x_noisy, t, **cond)
File "E:\git\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "E:\git\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 1329, in forward
out = self.diffusion_model(x, t, context=cc)
File "E:\git\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "E:\git\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\openaimodel.py", line 776, in forward
h = module(h, emb, context)
File "E:\git\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "E:\git\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\openaimodel.py", line 84, in forward
x = layer(x, context)
File "E:\git\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "E:\git\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\attention.py", line 324, in forward
x = block(x, context=context[i])
File "E:\git\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "E:\git\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\attention.py", line 259, in forward
return checkpoint(self._forward, (x, context), self.parameters(), self.checkpoint)
File "E:\git\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\util.py", line 114, in checkpoint
return CheckpointFunction.apply(func, len(inputs), *args)
File "E:\git\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\util.py", line 129, in forward
output_tensors = ctx.run_function(*ctx.input_tensors)
File "E:\git\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\attention.py", line 262, in _forward
x = self.attn1(self.norm1(x), context=context if self.disable_self_attn else None) + x
File "E:\git\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "E:\git\stable-diffusion-webui\modules\sd_hijack_optimizations.py", line 127, in split_cross_attention_forward
s1 = einsum('b i d, b j d -> b i j', q[:, i:end], k)
File "E:\git\stable-diffusion-webui\venv\lib\site-packages\torch\functional.py", line 378, in einsum
return _VF.einsum(equation, operands) # type: ignore[attr-defined]
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 26.03 GiB (GPU 0; 24.00 GiB total capacity; 2.25 GiB already allocated; 19.05 GiB free; 2.71 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
GPU has 4476MiB of its VRAM allocated after I do this - I'm not sure if that is higher than usual?
A few other things:
--medvram
and --lowvram
, not even this helpsTried to allocate 42.55 GiB (GPU 0; 24.00 GiB total capacity; 2.31 GiB already allocated; 18.86 GiB free; 2.90 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
- I certainly didn't require almost 43GiB of VRAM@hopto-dot thanks for sharing your console log. With 24GB VRAM this is quite extreme.
I'm having the same error. While I don't have nearly as much VRAM as you (only 8gb) I was still able to run generations at 800x450 or 480x600 and upscale them at x2. Now I can only upscale the 480x600 at x1.5 and I can't upscale the 800x450 at all, getting the same memory error when I do.
I'm having the same problem. Mine is trying to allocate over 10Gb for 2x on a 512x768.
I am experiencing full use of 24gb of ram on a 4090 and the image failing to complete when using 'restore faces' seemingly randomly. After many tests it seems to relate specifically to subject matter and how close a face resembles a real person to begin with.
Easy to discern faces such as regular men and women complete fine, however full usage (to the point of it hanging completely) seem to occur on subjects such as demons, goblins and some anthropomorphic animals.
As a temporary fix I did git checkout
to cc8c9b7474d917888a0bd069fcd59a458c67ae4b (last commit on Jan 27)
Highres fix works well. So I guess breaking commit is somewhere in Jan 28 - Feb 1 range.
And this issue is not OS/GPU/Sampler specific, I experienced it with Ubuntu/A4000/DDIM.
I can confirm that rolling back to cc8c9b7 has solved the issue for the moment.
I'm sorry if this comment will be approximate but I don't have git knowledge and don't have a comparison to make since I started using the web ui a few days ago. I have a GTX 1650 mobile with 4 GB VRAM, Windows 11. I am generating at 832x704 with DPM++ SDE Karras, using --xformers
and --medvram
flags. If I don't use --medvram
the first generation goes fine, but I get a memory allocation error right at the end of the second generation. Restarting the web ui fixes the problem for a generation. I was wondering if this was intended behaviour or not, but it's not easy to find information about low VRAM cards on the internet. I typed git reset --hard cc8c9b7
in the command line but I'm not sure it's enough to get that commit.
I'm sorry if this comment will be approximate but I don't have git knowledge and don't have a comparison to make since I started using the web ui a few days ago. I have a GTX 1650 mobile with 4 GB VRAM, Windows 11. I am generating at 832x704 with DPM++ SDE Karras, using
--xformers
and--medvram
flags. If I don't use--medvram
the first generation goes fine, but I get a memory allocation error right at the end of the second generation. Restarting the web ui fixes the problem for a generation. I was wondering if this was intended behaviour or not, but it's not easy to find information about low VRAM cards on the internet. I typedgit reset --hard cc8c9b7
in the command line but I'm not sure it's enough to get that commit.
It's not normal no, it's probably the same bug you're experiencing.
Running git checkout <hash>
will get the right commit. I think your command git reset --hard <hash>
also does the trick.
I have noticed that if you keep using the same model, you're less likely to run into this issue. it feels like every time you change a model a chunk of your VRAM goes with it until you are out of memory and need to restart it.
I have noticed that if you keep using the same model, you're less likely to run into this issue. it feels like every time you change a model a chunk of your VRAM goes with it until you are out of memory and need to restart it.
Yep. Similar things happen if you start with generating image previews, then turn them off. All signs of memory leaks.
I'm also experiencing issues with memory managment. I can only generate 4 images and then my desktop crashes and I have to hit Ctrl + Alt + Backspace to close the session. I used to be able to generate image after image before, I don't know why is this happening.
I was still able to run generations at 800x450 or 480x600 and upscale them at x2. Now I can only upscale the 480x600 at x1.5 and I can't upscale the 800x450 at all, getting the same memory error when I do.
I've only 4 Gb but just a couple months ago with --medvram
I could render 1600x1024 and slightly above (but somehow not 1280x1280), now sometimes getting Out of memory even on 1024x768
What I find the most absurd is when the Out of memory error happens after all the low-res and then hi-res steps are completed - it's like something is trying to use VRAM for some trivial task that could/should have been done in the normal RAM...
As a temporary fix I did
git checkout
to cc8c9b7 (last commit on Jan 27) Highres fix works well. So I guess breaking commit is somewhere in Jan 28 - Feb 1 range.And this issue is not OS/GPU/Sampler specific, I experienced it with Ubuntu/A4000/DDIM.
For me did not help, from yesterday I got this issue, also, when SD loads it takes 3 GB VRAM at the start, yesterday it took only 0.2 GB VRAM, so I don't know what have changed from yesterday, I did not do any changes to files.
I got it fixed. the problem in my case was that my version of pytorch was 1.12 while the latest updates of the webui are designed to work at 1.13.
to fix the issue, just edit webui-user.bat by adding --reinstall-torch --reinstall-xformers in the argument section. this will install the latest compatible versions of pytorch and xformers you could delete these arguments after everything is updated.
see if this works for you
can confirm i've had a similar issue, but while writing this i seem to have somehow came up with a temporary fix. nvtop shows that there are literally 0 other processes using the GPU. If i accidentally generate a large image, it crashes due to out of memory, but the memory it was attempting to use isn't cleared out. trying to generate another image would fail, but if i tried to generate a sufficiently tiny image (256x256 or so) it would suddenly remember it needed to clear out that vram. Strangely enough, when I disabled the "Show generation progress in window title" option after one of these failures and reloaded the UI, it still throws an out of memory error... but the progress bar in the command line keeps running (not in the webui though, it looks dead from there) and five or so seconds after it finishes, it spits the image out and it shows up in the webui as though nothing happened. I'm not confident that the actual setting change did much, but resetting the UI afterwards seemed to be the fix.
launch options were "--xformers --opt-split-attention --opt-sub-quad-attention --share", images were 512x512 upscaled 2x to 1024x1024 by hires fix, with a 2060 6gb.
what is your pytorch version? if it's 1.12 update it to 1.13(as a bonus update xformers too) and see if it's fixed. this solved all my VRAM issues that started appearing with the new updates of the webui
torch says it's 1.13.1, and xformers was freshly installed. I'm running on an utterly godawful temporary setup right now where half my ram is actually HDD swap (its a long story), so restarting takes ~15 minutes.
here's the versions I have installed, just to make 100% sure i'm reading them correctly
torch says it's 1.13.1, and xformers was freshly installed. I'm running on an utterly godawful temporary setup right now where half my ram is actually HDD swap (its a long story), so restarting takes ~15 minutes.
here's the versions I have installed, just to make 100% sure i'm reading them correctly
hmm, yeah it should have worked with those versions. it seems it's a different issue to mine then. the only other advice I could give now, is to disable live preview if you have it ON. hopefully someone has a solution for you.
hmm, i can't remember the last time I even updated SD itself... come to think of it, how can I upgrade my already installed version of stable diffusion again?
copy the important folder/files (models, extensions, embeddedings, style.xls, GFPGAN, output)
then delete the folder and reclone automatic1111 repository and transfer those files you copied earlier back in. this is how I usually do it.
ps: maybe consider not pasting your extensions folder when you run it for the first time. because they usually cause issues when you update the webui.so they might actually be the cause of your problems to begin with.
hmm, i can't remember the last time I even updated SD itself... come to think of it, how can I upgrade my already installed version of stable diffusion again?
You can make git pull in webui-user.bat and also add --medvram it helped with my taking too much VRAM problem.
git pull
@echo off
set PYTHON=
set GIT=
set VENV_DIR=
set COMMANDLINE_ARGS= --xformers --medvram --share
call webui.bat
add --medvram it helped with my taking too much VRAM problem
Disabling image previews mostly solved my issue, but i'll try throwing this in too so I can make even larger images
split attention v1 fixes OOM issues https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/8394
Hi ! I have the same error with my rx 6800 GPU... I have 16GB VRAM, but I have this error :
RuntimeError: Could not allocate tensor with 7833840 bytes. There is not enough GPU video memory available! I do not understand since 7833840 bytes = 0,007GB...
Edit : fixed with set COMMANDLINE_ARGS=--opt-sub-quad-attention --medvram --disable-nan-check --autolaunch in the webui-user.bat file :)
Is there an existing issue for this?
What happened?
Frequent CUDA out-of-memory errors.
These have been reported several times by other users, however other reports have blamed specific features like the hi-res fix or image preview generation or some other commit. However I think there is something decidedly wrong with the memory management of
stable-diffusion-webui
altogether. I think the other complaints are just manifestations of this poor memory management.I have been getting inconsistent out-of-memory errors from CUDA despite having demonstrably more than double the amount of required VRAM to complete the task of upscaling (I have successfully upscaled two images simultaneously a few times. But it rarely works.)
I have been using this program from a frame buffer in Linux to ensure that nothing else is touching the GPU. Indeed,
nvidia-smi
confirms that the GPU is actually even powered off beforestable-diffusion-webui
is launched. I repeat: nothing is touching the GPU except for this program, and this program produces CUDA out-of-memory errors on an inconsistent basis, in situations where I have previously completed a task requiring double the amount of required VRAM..I have received out-of-memory errors where there was insufficient VRAM available, but it could not be allocated (implies fragmentation). And I have received out-of-memory errors where there was insufficient VRAM available.
It seems clear that there is a VRAM memory leak in this program.
Steps to reproduce the problem
Do anything memory-intensive. Upscale multiple images in batches. Generate live previews while upscaling a single image. That kind of thing. Live previews appear to make the issue significantly worse.
What should have happened?
If I am able to upscale two images simultaneously, then this demonstrates that there is sufficient VRAM for the task. This should be repeatable, without out-of-memory errors.
The program should properly allocate and release VRAM, and it should explicit defragment the memory if necessary.
Commit where the problem happens
I've tried a few, dating back to about 2023-01-20.
What platforms do you use to access the UI ?
No response
What browsers do you use to access the UI ?
No response
Command Line Arguments
List of extensions
None in use.
Console logs
Additional information
Please do unittests and get to the bottom of the memory leak.