[Bug]: Recent update broke memory for me.

T9es commented 1 year ago

Is there an existing issue for this?

[x] I have searched the existing issues and checked the recent builds/commits

What happened?

After updating with git pull, the latest release seems to have A LOT of issues. In my case, the models never seem to leave RAM. When running one instance, I go from 7gb RAM usage, up to 16gb in seconds. I have 32gigs, but this causes A LOT of instability in the system. Can't generate images that I did generate yesterday, or even today morning before updating.

On top of this, here's a weird screenshot from my dedicated SD card. See the issue? There's VRAM available, but it's not getting taken by SD.

Steps to reproduce the problem

There are no steps. Just start the UI and run a regular generation, like per usual. No specific settings, no loras. Just simply typing "yes" into the prompt window and hitting generate.

What should have happened?

Everything gets loaded, RAM gets cleared, there are no issues with VRAM allocation.

Commit where the problem happens

22bcc7be428c94e9408f589966c2040187245d81

What platforms do you use to access the UI ?

Windows 10

What browsers do you use to access the UI ?

Any browser, chrome, firefox, edge, opera.

Command Line Arguments

--port 7861 --xformers --no-half-vae --opt-split-attention --medvram --ckpt-dir "R:\Stable diffusion\Stable-diffusion" --hypernetwork-dir "R:\Stable diffusion\Lora"

(These might be a bit messed up, as I was experimenting on the issue with no results.)

List of extensions

Don't matter. I run a clean install with no extensions on my second card and the issue was still there.

Console logs

(These are from the 8gb card with extensions.)

venv "Y:\AI-GIT-CLONE\REPAIR\stable-diffusion-webui\venv\Scripts\Python.exe"
Python 3.10.10 (tags/v3.10.10:aad5f6a, Feb  7 2023, 17:20:36) [MSC v.1929 64 bit (AMD64)]
Commit hash: 22bcc7be428c94e9408f589966c2040187245d81
Installing requirements for Web UI

Installing requirements for Batch Face Swap

Installing Installing onnxruntime-gpu...
Installing Installing opencv-python...
Installing Installing Pillow...
Installing Installing segmentation-refinement...
Installing Installing scikit-learn...

Installing sd-dynamic-prompts requirements.txt

Installing requirements for Shift Attention

Launching Web UI with arguments: --port 7861 --xformers --no-half-vae --opt-split-attention --medvram --ckpt-dir R:\Stable diffusion\Stable-diffusion --hypernetwork-dir R:\Stable diffusion\Lora
Additional Network extension not installed, Only hijack built-in lora
LoCon Extension hijack built-in lora successfully
[AddNet] Updating model hashes...
0it [00:00, ?it/s]
[AddNet] Updating model hashes...
0it [00:00, ?it/s]
ControlNet v1.1.140
ControlNet v1.1.140
Loading weights [deb40a068a] from R:\Stable diffusion\Stable-diffusion\meinahentai_v21.safetensors
Creating model from config: Y:\AI-GIT-CLONE\REPAIR\stable-diffusion-webui\configs\v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Loading VAE weights specified in settings: Y:\AI-GIT-CLONE\REPAIR\stable-diffusion-webui\models\VAE\vae-ft-ema-560000-ema-pruned.ckpt
Applying xformers cross attention optimization.
Error loading embedding eyesgenLoraWIP_v1.safetensors:
Traceback (most recent call last):
  File "Y:\AI-GIT-CLONE\REPAIR\stable-diffusion-webui\modules\textual_inversion\textual_inversion.py", line 210, in load_from_dir
    self.load_from_file(fullfn, fn)
  File "Y:\AI-GIT-CLONE\REPAIR\stable-diffusion-webui\modules\textual_inversion\textual_inversion.py", line 176, in load_from_file
    assert len(data.keys()) == 1, 'embedding file has multiple terms in it'
AssertionError: embedding file has multiple terms in it

Model loaded in 1.6s (load weights from disk: 0.2s, create model: 0.3s, apply weights to model: 0.4s, apply half(): 0.4s, load VAE: 0.2s, load textual inversion embeddings: 0.1s).
add tab
Running on local URL:  http://127.0.0.1:7861

To create a public link, set `share=True` in `launch()`.
Startup time: 23.1s (import torch: 4.7s, import gradio: 1.2s, import ldm: 0.6s, other imports: 0.9s, load scripts: 7.8s, load SD checkpoint: 1.7s, create ui: 5.9s, gradio launch: 0.1s).
100%|██████████████████████████████████████████████████████████████████████████████████| 42/42 [00:06<00:00,  6.35it/s]
Error completing request▎                                                             | 42/840 [00:07<01:29,  8.95it/s]
Arguments: ('task(34tkte726qc4pld)', 'yes', '', [], 42, 15, True, False, 10, 1, 7, -1.0, -1.0, 0, 0, 0, False, 512, 288, True, 0.5, 2, '4x-UltraSharp', 0, 0, 0, [], 0, False, 'txt2img', False, '', '', False, 'Euler a', False, 'meinahentai_v21.safetensors [deb40a068a]', True, 0.5, True, 4, True, 32, True, False, 30, False, 6, False, 512, 512, '', False, 1, 'Both ▦', False, '', False, True, True, False, False, False, False, 1, False, '', '', '', 'generateMasksTab', 1, 4, 2.5, 30, 1.03, 1, 1, 5, 0.5, 5, False, True, False, 20, False, 'None', 'None', 'None', 'None', 'None', 0.7, 'None', True, False, 1, False, False, False, 1.1, 1.5, 100, 0.7, False, False, True, False, False, 0, 'Gustavosta/MagicPrompt-Stable-Diffusion', '', False, 7, 100, 'Constant', 0, 'Constant', 0, 4, False, 'x264', 'mci', 10, 0, False, True, True, True, 'intermediate', 'animation', False, False, 'LoRA', 'None', 1, 1, 'LoRA', 'None', 1, 1, 'LoRA', 'None', 1, 1, 'LoRA', 'None', 1, 1, 'LoRA', 'None', 1, 1, None, 'Refresh models', <controlnet.py.UiControlNetUnit object at 0x00000267C0B337F0>, False, 1, 0.15, False, 'OUT', ['OUT'], 5, 0, 'Bilinear', False, 'Pooling Max', False, 'Lerp', '', '', False, False, None, True, False, False, 'positive', 'comma', 0, False, False, '', 1, '', 0, '', 0, '', True, False, False, False, 0, False, False, False, '#000000', False, '', 'None', 30, 4, 0, 0, False, 'None', '<br>', 'None', 30, 4, 0, 0, 4, 0.4, True, 32, None, False, 50, 10.0, 30.0, True, 0.0, 'Lanczos', 1, 0, 0, 75, 0.0001, 0.0, False, True, False) {}
Traceback (most recent call last):
  File "Y:\AI-GIT-CLONE\REPAIR\stable-diffusion-webui\modules\call_queue.py", line 56, in f
    res = list(func(*args, **kwargs))
  File "Y:\AI-GIT-CLONE\REPAIR\stable-diffusion-webui\modules\call_queue.py", line 37, in f
    res = func(*args, **kwargs)
  File "Y:\AI-GIT-CLONE\REPAIR\stable-diffusion-webui\modules\txt2img.py", line 56, in txt2img
    processed = process_images(p)
  File "Y:\AI-GIT-CLONE\REPAIR\stable-diffusion-webui\modules\processing.py", line 503, in process_images
    res = process_images_inner(p)
  File "Y:\AI-GIT-CLONE\REPAIR\stable-diffusion-webui\extensions\sd-webui-controlnet\scripts\batch_hijack.py", line 42, in processing_process_images_hijack
    return getattr(processing, '__controlnet_original_process_images_inner')(p, *args, **kwargs)
  File "Y:\AI-GIT-CLONE\REPAIR\stable-diffusion-webui\modules\processing.py", line 653, in process_images_inner
    samples_ddim = p.sample(conditioning=c, unconditional_conditioning=uc, seeds=seeds, subseeds=subseeds, subseed_strength=p.subseed_strength, prompts=prompts)
  File "Y:\AI-GIT-CLONE\REPAIR\stable-diffusion-webui\modules\processing.py", line 902, in sample
    decoded_samples = decode_first_stage(self.sd_model, samples)
  File "Y:\AI-GIT-CLONE\REPAIR\stable-diffusion-webui\modules\processing.py", line 440, in decode_first_stage
    x = model.decode_first_stage(x)
  File "Y:\AI-GIT-CLONE\REPAIR\stable-diffusion-webui\modules\sd_hijack_utils.py", line 17, in <lambda>
    setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs))
  File "Y:\AI-GIT-CLONE\REPAIR\stable-diffusion-webui\modules\sd_hijack_utils.py", line 28, in __call__
    return self.__orig_func(*args, **kwargs)
  File "Y:\AI-GIT-CLONE\REPAIR\stable-diffusion-webui\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "Y:\AI-GIT-CLONE\REPAIR\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 826, in decode_first_stage
    return self.first_stage_model.decode(z)
  File "Y:\AI-GIT-CLONE\REPAIR\stable-diffusion-webui\modules\lowvram.py", line 51, in first_stage_model_decode_wrap
    send_me_to_gpu(first_stage_model, None)
  File "Y:\AI-GIT-CLONE\REPAIR\stable-diffusion-webui\modules\lowvram.py", line 33, in send_me_to_gpu
    module_in_gpu.to(cpu)
  File "Y:\AI-GIT-CLONE\REPAIR\stable-diffusion-webui\venv\lib\site-packages\lightning_fabric\utilities\device_dtype_mixin.py", line 54, in to
    return super().to(*args, **kwargs)
  File "Y:\AI-GIT-CLONE\REPAIR\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 989, in to
    return self._apply(convert)
  File "Y:\AI-GIT-CLONE\REPAIR\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 641, in _apply
    module._apply(fn)
  File "Y:\AI-GIT-CLONE\REPAIR\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 641, in _apply
    module._apply(fn)
  File "Y:\AI-GIT-CLONE\REPAIR\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 641, in _apply
    module._apply(fn)
  [Previous line repeated 5 more times]
  File "Y:\AI-GIT-CLONE\REPAIR\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 664, in _apply
    param_applied = fn(param)
  File "Y:\AI-GIT-CLONE\REPAIR\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 987, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
RuntimeError: [enforce fail at ..\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 3276800 bytes.

Additional information

2 cards with the same issue. 3060TI (8gb) and a 2060 (6gb). The 2060 is without any extensions, it's vanilla running with --xformers and --api.

Sakura-Luna commented 1 year ago

Do you have a way to reproduce it stably?

T9es commented 1 year ago

Not really. This just keeps happening no matter what I do. I have no clue what could be the cause or how to even come close to this problem. All I know that it was working before the recent update. After that it was a no-go. Maybe some VENV got messed up?

Sakura-Luna commented 1 year ago

DefaultCPUAllocator: not enough memory

This is a problem that has been around for a while, maybe a recent update made it prone to it, but it wouldn't be a new problem. I haven't found out why.

T9es commented 1 year ago

Oh my god, I have a clue to what might have happened. I was messing around with my new SSD yesterday, might be something related to page file. I will try going into this later, since I'm not home as of now and update if that fixed it.

This would explain some things crashing, like discord and chrome, although I have no clue why there would be crashes while 16gigs from 32 are being used.

T9es commented 1 year ago

Seems like this was a page file issue. I'm going to run some tests on my RAM, since that's the main part that could have failed. After setting the page file to the previous settings, everything is working fine.

sangoi-exe commented 1 year ago

The problem exist, it's happening to me too. On Colab, from time to time, it collapses because of the excessive use of RAM.

T9es commented 1 year ago

I'm re-opening the issue, as this issue happens from time to time. I've been playing around with generations, and I get DefaultCPUAllocator errors from time to time. RAM usage at max is about 60-70% from 32gb. After the error, I need to restart the AI, since it will not want to generate anything.

I saw people referencing that it's some kind of RAM issue (not enough of it), but as you can see, that's not the case here.

Sakura-Luna commented 1 year ago

Try Windows' memory test.

T9es commented 1 year ago

Already did. No issues, 10 tests, even switched RAM slots for another set of 10 tests.

thesomeotherguy commented 1 year ago

My baseline to create image is always 608 x 768, and then use Hires fix 1.8 times, to get approximate Instagram 4:5 resolution. Final Size: 1094x1382, while Instagram 4:5 is 1080x1350. So I need that resolution in one click.

For me since it's 1.3.0 and now 1.3.1, that 608 x 768 resolution and 1.8 time upscale on Hires fix to get my target resolution cannot be used (VRAM always out of memory), and I have to lower my resolution or upscale value, and the result I get is lower on my final resolution. I also trying to setting up Hires steps as half of Sampling steps also doesn't work.

For information, my machine is RTX 3060 Laptop with 6GB VRAM. I always use --xformers --medvram

The only possible way for me to reach my target resolution while preserve detail is using "img2img" with custom setting upscaler mimicking what Hires fix does, and it's successful, there's no VRAM out of memory. It's weird because it's basicaly a same thing. The custom setting is located on Settings: Upscalling > Upscaler for img2img > change it to upscaler that I previously use on Hires fix.

But it's now I have to switch tab to upscale, everytime. It's working but it's not convinience.

And then everyone on another forum and other issue related to this always suggest to use Doggettx as Optimization method after 1.3.0

And I swear I already tried it. I tried to remove xformers and it's only --medvram on my args, and using Doggettx on from Settings, already restart the command prompt, but it's still not avail at all for that resolution and Hires fix value for my target resolution.

It is real, Doggettx is not for every machine, in my case even when it's on, Hires fix still out of memory.

So I'm affraid it's my machine fault or some Windows settings/update/drivers does something bad. So I tried a create a new folder to git clone and hard reset the latest known good A1111, for me v1.2.1 (hash 89f9faa), and install that version from scratch on another folder.

Running from that folder (v1.2.1 89f9faa) the Hires fix can be used as before, with upto 1.8x upscale, but on 1.3.0 and 1.3.1 it's indeed cannot.

So it's not because my system, because it's clear that Hires fix on v1.2.1 89f9faa could work fine.

I'm not complaining, I just pointed out that something is wrong with 1.3.0 and 1.3.1, and Doggettx is not my answer, and I need help.

T9es commented 1 year ago

Closing this down, as it's no longer an issue for me. Has been working fine since the last month or so without any issues. Still related to page file on my end.

AUTOMATIC1111 / stable-diffusion-webui