[Bug]: VRAM usage is way higher

shimizu-izumi commented 1 year ago

Is there an existing issue for this?

[x] I have searched the existing issues and checked the recent builds/commits

What happened?

I updated the WebUI a few minutes ago and now the VRAM usage when generating an image is way higher. I have 3 monitors (2x 1920x1080 & 1x 2560x1440), I use Wallpaper Engine on all of them, but I have Discord open on of them nearly 24/7, so Wallpaper Engine is only active for two monitors. 1.5 GB VRAM are used when I am on the Desktop without the WebUI running. Web Browers: Microsoft Edge (Chromium) OS: Windows 11 (Build number: 22621.963) GPU: NVIDIA GeForce RTX 3070 Ti (KFA2) CPU: Intel Core i7-11700K RAM: Corsair VENGEANCE LPX 32 GB (2 x 16 GB) DDR4 DRAM 3200 MHz C16

Steps to reproduce the problem

Start the WebUI
Use the following settings to generate an image

Positive prompt: masterpiece, best quality, 1girl, brown hair, green eyes, colorful, autumn, cumulonimbus clouds, lighting, blue sky, falling leaves, garden Negative prompt: lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name, Steps: 50, Sampler: Euler a, CFG scale: 12, Seed: 3607441108, Size: 512x768, Model hash: 8d9aaa54, Model: Anything V3 (non pruned with vae), Denoising strength: 0.69, Clip skip: 2, Hires upscale: 2, Hires upscaler: R-ESRGAN AnimeVideo

What should have happened?

The generation should complete without any errors

Commit where the problem happens

1cfd8aec4ae5a6ca1afd67b44cb4ef6dd14d8c34

What platforms do you use to access UI ?

Windows

What browsers do you use to access the UI ?

Microsoft Edge

Command Line Arguments

--xformers

Additional information, context and logs

I have the config for animefull from the Novel AI leak in the configs folder under the name Anything V3.0.yaml, but I get this error too when I remove it from the configs folder and completely restart the WebUI. This is the error I get

RuntimeError: CUDA out of memory. Tried to allocate 1.50 GiB (GPU 0; 8.00 GiB total capacity; 4.70 GiB already allocated; 0 bytes free; 5.96 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

ClashSAN commented 1 year ago

when did you last update webui? This maybe from a windows update. you may want to disable browser hardware acceleration. I've found openoutpaint extension automatically uses some vram with browser hardware acceleration

walkerakiz commented 1 year ago

Same issue here, for a simple 5.x5 i cant even use with the normal sd 2.1 model or any upscale. That happened with the new update today. :/

shimizu-izumi commented 1 year ago

when did you last update webui? This maybe from a windows update. you may want to disable browser hardware acceleration. I've found openoutpaint extension automatically uses some vram with browser hardware acceleration

I updated the WebUI around 2 PM UTC+1. The last major Windows update was a few weeks ago. When I used the WebUI a few days ago, everything still worked without any errors, and I don't have the openoutpaint extension.

mxzgithub commented 1 year ago

I made a fresh install right now with a RTX4090. Running out of VRAM constantly, never happened before.

RuntimeError: CUDA out of memory. Tried to allocate 4.00 GiB (GPU 0; 23.99 GiB total capacity; 12.81 GiB already allocated; 0 bytes free; 21.74 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.

Alphyn-gunner commented 1 year ago

Denoising strength: 0.69, Clip skip: 2, Hires upscale: 2, Hires upscaler: R-ESRGAN AnimeVideo

I might be mistaken because you, but I think the culprit is the new Highres fix. It upscales the images before processing them for the second time and they may be too big to fit into your VRAM. I see a lot of people complaining about how confusing it to use and how it gives inferior results. In my experience as well it is of a questionable usability right now.

If you really need to use the Highres fix now, try setting the upscaling factor to 1. It somehow makes it behave, even though its counter-intuitive, and the default setting is 2. Here are some examples I got: Default settings (upscale by 2): Upscale by 1: 01255-644973770-(extremely detailed CG unity 8k wallpaper), full shot body photo of a (((beautiful badass woman soldier))) with ((white hair)),

On the other hand, I just noticed that you have a lot of ram, so it makes me think I'm completely wrong about my assumption, and there is something else entirely going on. I'm going to try and use your settings with the same model and see what I get on 8 gb.

Alphyn-gunner commented 1 year ago

Here's the result I got: `masterpiece, best quality, 1girl, brown hair, green eyes, colorful, autumn, cumulonimbus clouds, lighting, blue sky, falling leaves, garden Negative prompt: lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name, Steps: 50, Sampler: Euler a, CFG scale: 7, Seed: 3607441108, Size: 512x768, Model: Anything-V3.0-pruned-fp32, Denoising strength: 0.69, Clip skip: 2, Hires upscale: 2, Hires upscaler: R-ESRGAN 4x+ Anime6B

Time taken: 4m 49.25sTorch active/reserved: 4777/6598 MiB, Sys VRAM: 8192/8192 MiB (100.0%)` It used all the available memory, but didn't run out. It also made the image twice the size I ordered and it took me almost 5 minutes on a 1070 ti.

Commit hash: 24d4a0841d3cc0e5908b098f65a9caa3fa889af8

shimizu-izumi commented 1 year ago

@Alphyn-gunner It's twice the size because of the hires upscale value.

shimizu-izumi commented 1 year ago

I also noticed that I now get completely different results with the exact same settings. 00017

ClashSAN commented 1 year ago

I made a fresh install right now with a RTX4090. Running out of VRAM constantly, never happened before.

RuntimeError: CUDA out of memory. Tried to allocate 4.00 GiB (GPU 0; 23.99 GiB total capacity; 12.81 GiB already allocated; 0 bytes free; 21.74 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.

Could you post the before and after image size limit?

I also noticed that I now get completely different results with the exact same settings.

Were you using xformers?

lolxdmainkaisemaanlu commented 1 year ago

I have the same problem, and I don't even use the hi-res fix! I just do normal gen but the VRAM usage is WAYYYYY higher now! I can't do the same batch size that I used to be able to do previously! Everything else is the same, I changed nothing. It only git pulled..

Campfirecrucifix commented 1 year ago

Same issue here, for a simple 5.x5 i cant even use with the normal sd 2.1 model or any upscale. That happened with the new update today.

I honestly thought I was the only one. Generating images is SOO much slower now(And I have a 4090). I really wish there was a way to revert back to the previous update.

I also noticed that I now get completely different results with the exact same settings.

Also getting the same problem. I was wondering why hires was taking so long now so I decided to recreate one of my previous images and I got nothing like it with all the same settings and it took forever.

mykeehu commented 1 year ago

In the latest versions, hires fix have been modified. Do the 5f4fa942b8ec3ed3b15a352903489d6f9e6eb46e versions also have bugs?

GarbageHaus commented 1 year ago

For what it's worth I've also noticed this when training an embedding as of updating today via a fresh install. I have an old version which doesn't have any issues which was how the repository was as of 11/5. I have a lower end card (RTX 2060 6G) so embeddings are all I can do for the moment.

Previously I could train a 512/512 embedding and use the "Read parameters" option on the SD1.4 checkpoint. The message I get states 512mb additional VRAM is needed. For experimentation, I lowered the 512 values and the embedding began to train. However, when it tried to generate an image mid-training, the CUDA memory issue occurred again.

It is worth noting that I'm able to use regular prompts as well as the embedding that was terminated early after running out of memory. So this might be helpful in determining what the cause is.

nonetrix commented 1 year ago

Same here, as suggested using a less extreme upscale option worked. However, it is considerably slower still. having different highers fix back ends is nice and might yield better results, but why is this the only option? Why not add both?

What is the last known commit that doesn't have this change? I think I'll switch back for that in the time being.

Nilok7 commented 1 year ago

The currently Hires. Fix seems to be tuned much more for higher end cards. It would be very helpful if there was a way to tuned the Hires. Fix to the previous settings, either a direct option or an update to the wiki, for 8GB and lower cards.

DrGunnarMallon commented 1 year ago

For now you could always checkout a previous version:

git checkout fd4461d44c7256d56889f5b5ed9fb660a859172f

This is the one I'm using for the time being as I find the system pretty much unusable as it is now.

shimizu-izumi commented 1 year ago

Yes, I use xformers. What do you mean by image size limit?

nanafy commented 1 year ago

I have the same issue. Found it while using Hi-res fix. I completely understand how to use it, that's not the issue. Now I run out of vram for the same batch sizes/dimensions as before @lolxdmainkaisemaanlu also pointed out the same except they are not even using hi-res. I just happened to notice it on hi-res. Its an independent issue from hi-res fix it seems. reverting fd4461d as well curtousy to @DrGunnarMallon

DoughyInTheMiddle commented 1 year ago

For now you could always checkout a previous version:

git checkout fd4461d

This is the one I'm using for the time being as I find the system pretty much unusable as it is now.

I'm running A1111 on a 2060 Super, so 8GB of VRAM.

I had a bit of a workflow to do a couple of 512x512 low-level passes, and then bumped it up to 768 to start getting in detail, finally finishing off and upscaling to 1024. I've been doing passes of this process for almost a week (I've been making daily "Twelve Days of Christmas" images).

Even on my older card, it works. Now, even going from 512 to 768 with just 50 steps it just wrecks. I currently cannot render anything at 768x768.

I tried resetting to the hash recommended above, but I'm still going OOM. Is there another hash to recommend reverting to prior to that?

Error completing request
Arguments: (0, 'a photograph of  a single red apple, on a yellow plate, on a blue checkered tablecloth.', '', 'None', 'None', <PIL.Image.Image image mode=RGBA size=512x512 at 0x1EFB7F20DF0>, None, None, None, None, 0, 50, 0, 4, 0, 1, False, False, 1, 4, 7, 0.2, 1254105237.0, -1.0, 0, 0, 0, False, 768, 768, 0, False, 32, 0, '', '', 0, '<ul>\n<li><code>CFG Scale</code> should be 2 or lower.</li>\n</ul>\n', True, True, '', '', True, 50, True, 1, 0, False, 4, 1, '<p style="margin-bottom:0.75em">Recommended settings: Sampling Steps: 80-100, Sampler: Euler a, Denoising strength: 0.8</p>', 128, 8, ['left', 'right', 'up', 'down'], 1, 0.05, 128, 4, 0, ['left', 'right', 'up', 'down'], False, None, None, '', '', '', '', 'Auto rename', {'label': 'Upload avatars config'}, 'Open outputs directory', 'Export to WebUI style', True, {'label': 'Presets'}, {'label': 'QC preview'}, '', [], 'Select', 'QC scan', 'Show pics', None, False, False, False, False, '', '<p style="margin-bottom:0.75em">Will upscale the image by the selected scale factor; use width and height sliders to set tile size</p>', 64, 0, 2, 'Positive', 0, ', ', True, 32, 1, '', 0, '', True, False, False) {}
Traceback (most recent call last):
  File "G:\GitHub\SDWebUI\modules\call_queue.py", line 45, in f
    res = list(func(*args, **kwargs))
  File "G:\GitHub\SDWebUI\modules\call_queue.py", line 28, in f
    res = func(*args, **kwargs)
  File "G:\GitHub\SDWebUI\modules\img2img.py", line 152, in img2img
    processed = process_images(p)
  File "G:\GitHub\SDWebUI\modules\processing.py", line 471, in process_images
    res = process_images_inner(p)
  File "G:\GitHub\SDWebUI\modules\processing.py", line 541, in process_images_inner
    p.init(p.all_prompts, p.all_seeds, p.all_subseeds)
  File "G:\GitHub\SDWebUI\modules\processing.py", line 887, in init
    self.init_latent = self.sd_model.get_first_stage_encoding(self.sd_model.encode_first_stage(image))
  File "G:\GitHub\SDWebUI\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "G:\GitHub\SDWebUI\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 830, in encode_first_stage
    return self.first_stage_model.encode(x)
  File "G:\GitHub\SDWebUI\repositories\stable-diffusion-stability-ai\ldm\models\autoencoder.py", line 83, in encode
    h = self.encoder(x)
  File "G:\GitHub\SDWebUI\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "G:\GitHub\SDWebUI\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\model.py", line 526, in forward
    h = self.down[i_level].block[i_block](hs[-1], temb)
  File "G:\GitHub\SDWebUI\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "G:\GitHub\SDWebUI\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\model.py", line 138, in forward
    h = self.norm2(h)
  File "G:\GitHub\SDWebUI\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "G:\GitHub\SDWebUI\venv\lib\site-packages\torch\nn\modules\normalization.py", line 272, in forward
    return F.group_norm(
  File "G:\GitHub\SDWebUI\venv\lib\site-packages\torch\nn\functional.py", line 2516, in group_norm
    return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: CUDA out of memory. Tried to allocate 1.12 GiB (GPU 0; 8.00 GiB total capacity; 5.29 GiB already allocated; 0 bytes free; 6.53 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

nanafy commented 1 year ago

4af3ca5393151d61363c30eef4965e694eeac15e try that one. the other repo was throwing errors for me as well. Currently back up and running like I was before trying to get the latest build.

DoughyInTheMiddle commented 1 year ago

4af3ca5 try that one. the other repo was throwing errors for me as well. Currently back up and running like I was before trying to get the latest build.

That one isn't working for me either. Still going OOM.

After bashing git checkout xxxxxx, is there anything else I need to do other than to close the console and restart?

nanafy commented 1 year ago

When you open your auto1111 cmd, it tells you the commit version as soon as you run the webui.bat Does it say Commit hash: 4af3ca5393151d61363c30eef4965e694eeac15e Installing requirements for Web UI...

DoughyInTheMiddle commented 1 year ago

I restored back to the master branch and, NVidia just put out a driver update.

One of the two affected things, so at least I'm getting things to work better. Memory usage SEEMS better. Still watching it though for a bit.

nonetrix commented 1 year ago

Did you add git pull to your webui script? I've seen a few do that, for me at least reverting back to a old version fixed it for me. Funny because this change made me think xformers was the issue, I guess I'll have to give it another chance I was harsh

DoctorPavel commented 1 year ago

I'm not sure how related this is, but I haven't seen anybody else mention it. Loading a model in the webui, including at launch, has a coinflip's chance of maxing out my 8GB vram instantly and freezing my PC entirely. Has anybody else experienced this issue? This has been a thing since a few pulls now, even before the suspension. I have been running the webui inside a docker image on Ubuntu 20.04 with rocm and an RX 5700 XT AMD card.

ChinatsuHS commented 1 year ago

Having the same issue with just loading the Webui immediately uses and keeps using 5 out of the 8 GB of VRAM all since the new hires fix was implemented (most common error it OoM's on has to do with resolution scaling (even with hires fix disabled).. am not using SD2.x models at all so those should not be the issue.

with each generation the amount of VRAM in use seems to increase by a few MB ... (which stacks up fast over time) ... img2img is a no go at all as it immediately OoM's

ImBadAtNames2019 commented 1 year ago

Same issue here.

RuntimeError: CUDA out of memory. Tried to allocate 76.38 GiB (GPU 0; 12.00 GiB total capacity; 2.57 GiB already allocated; 7.19 GiB free; 2.58 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF Time taken: 16.44sTorch active/reserved: 2757/2774 MiB, Sys VRAM: 5051/12288 MiB (41.11%)

Centurion-Rome commented 1 year ago

See possible source in "new hires": https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/6725

mykeehu commented 1 year ago

I do not use Hires Fix, but I can no longer change models on Colab because it causes memory overflow:

--lowram, --lowvram and --medvram options no helped. This is the default RAM reservation at startup:

Update: I found a solution:

set VAE to None
under Settings -> Stable Diffusion, set Checkpoints and VAE cache to zero
save the settings and shut down SD (GUI restart is not enough!)
start again.

Regardless, I saw that every time I change the model, it occupies 1 GB more memory, so after a while it causes a memory overflow again.

Mistborn-First-Era commented 1 year ago

I have this problem as well. It consists of..

when I open the webui my vram is at 5000ish instead of the normal 500ish. This is idle usage
when I switch models or generate multiple picture in which the model switches via x\y\z my memory usage grows steadily until it maxes out.

LuluViBritannia commented 1 year ago

Hey guys, I got a similar issue : I updated the UI, and for some reason the VRAM usage skyrocketted. It turned out I had to remove the command lines that starts updates at launch. Literally half of my VRAM (3GB out of 6) was taken from the start of the software, and after removing both command lines ("git pull" and the one line to update torch), the VRAM usage became normal again.

So if you just updated the UI and you're now running out of VRAM, remove the command lines for the updates. Hopefully it helps!

Nilok7 commented 1 year ago

Hey guys, I got a similar issue : I updated the UI, and for some reason the VRAM usage skyrocketted. It turned out I had to remove the command lines that starts updates at launch. Literally half of my VRAM (3GB out of 6) was taken from the start of the software, and after removing both command lines ("git pull" and the one line to update torch), the VRAM usage became normal again.

So if you just updated the UI and you're now running out of VRAM, remove the command lines for the updates. Hopefully it helps!

Which file did you edit? I don't have any command lines in the webui-user.bat for that, and there isn't any Git Pull or Torch in the webui.bat

LuluViBritannia commented 1 year ago

Hey guys, I got a similar issue : I updated the UI, and for some reason the VRAM usage skyrocketted. It turned out I had to remove the command lines that starts updates at launch. Literally half of my VRAM (3GB out of 6) was taken from the start of the software, and after removing both command lines ("git pull" and the one line to update torch), the VRAM usage became normal again. So if you just updated the UI and you're now running out of VRAM, remove the command lines for the updates. Hopefully it helps!

Which file did you edit? I don't have any command lines in the webui-user.bat for that, and there isn't any Git Pull or Torch in the webui.bat

The launcher (the webui-user.bat file). I had put two command lines for the updates, thinking it would only affect the launch, but it was actually taking 3GB VRAM for no reason.

In your case that doesn't seem to be the issue. Sorry I can't help ^^'.

catboxanon commented 1 year ago

I've made two PRs that I think will finally address this. voldy (auto) has also made recent improvements to the dev branch in https://github.com/AUTOMATIC1111/stable-diffusion-webui/commit/0af4127fd14360ebb12c6569d98aebf8047abbfc and https://github.com/AUTOMATIC1111/stable-diffusion-webui/commit/ccb92339348f6973de39cde062982a51a4cd0818 that should improve this as well. Basically, if you miss the performance of hires fix in the early days before https://github.com/AUTOMATIC1111/stable-diffusion-webui/commit/ef27a18b6b7cb1a8eebdc9b2e88d25baf2c2414d changed it, I think this now fixes it. Note you should be using --medvram (or --lowvram), not using --no-half-vae, and using a high-performance optimizer like xformers to take the most advantage of these.

https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/12514 https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/12515

I also closed https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/6725 and https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/7002 since this issue is the most relevant. The former was just asking for old hires fix to be added back (where width/height is specified manually, which is supported) and the latter is technically a duplicate of this issue.

catboxanon commented 1 year ago

Closing this as I've done a few tests and VRAM usage is significantly lower as of the latest dev branch commit. In the scenario given in OP, VRAM peaks just under 6GB, which fits well within their given criteria. Open a new issue with more specifics if problems still occur.

AUTOMATIC1111 / stable-diffusion-webui