[Bug]: Out of memory happened when I tried to generate the second image And Unload SD checkpoint to free VRAM did not work when out of memory happend

Amazingldl commented 1 year ago

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits

What happened?

My GPU is RTX3070-Ti Laptop-8GB

Out of memory happened when I tried to generate the second image:

Open WebUI
Go to txt2img
Choose sd_xl_base_1.0 checkpoint
Write prompts and click the generate button (just generate one image), this will work
After 4-step done, try again to generate one image, out of memory will happen

Above all, I tried to use the button Unload SD checkpoint to free VRAM or Reload the last SD checkpoint back into VRAM in Settings-Actions, but it did not work.

Steps to reproduce the problem

Open WebUI
Go to txt2img
Choose sd_xl_base_1.0 checkpoint
Write prompts and click the generate button (just generate one image), this will work
After 4-step done, try again to generate one image, out of memory will happen

What should have happened?

I used the same setup for both generation steps, so I figured if it worked the first time, it should work the second time. The button Unload SD checkpoint to free VRAM should work when out of memory.

Version or Commit where the problem happens

v1.5.0

What Python version are you running on ?

Python 3.10.x

What platforms do you use to access the UI ?

Windows

What device are you running WebUI on?

Other GPUs

Cross attention optimization

Automatic

What browsers do you use to access the UI ?

Microsoft Edge

Command Line Arguments

--xformers

List of extensions

Stable-Diffusion-Webui-Civitai-Helper
StyleSelectorXL
canvas-zoom
sd-webui-controlnet
sd-webui-infinite-image-browsing
sd-webui-openpose-editor
sd-webui-regional-prompter

Console logs

Python 3.10.6 | packaged by conda-forge | (main, Oct 24 2022, 16:02:16) [MSC v.1916 64 bit (AMD64)]
Version: v1.5.0
Commit hash: a3ddf464a2ed24c999f67ddfef7969f8291567be

Launching Web UI with arguments: --xformers
Civitai Helper: Get Custom Model Folder
Civitai Helper: Load setting from: D:\Workspace\stable-diffusion-webui\extensions\Stable-Diffusion-Webui-Civitai-Helper\setting.json
2023-08-27 17:03:22,290 - ControlNet - INFO - ControlNet v1.1.233
ControlNet preprocessor location: D:\Workspace\stable-diffusion-webui\extensions\sd-webui-controlnet\annotator\downloads
2023-08-27 17:03:22,384 - ControlNet - INFO - ControlNet v1.1.233
Loading weights [e6bb9ea85b] from D:\Workspace\stable-diffusion-webui\models\Stable-diffusion\sd_xl_base_1.0_0.9vae.safetensors
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 16.1s (launcher: 6.2s, import torch: 3.6s, import gradio: 1.0s, setup paths: 1.1s, other imports: 1.1s, load scripts: 1.6s, create ui: 0.9s, gradio launch: 0.4s).
Creating model from config: D:\Workspace\stable-diffusion-webui\repositories\generative-models\configs\inference\sd_xl_base.yamlApplying attention optimization: xformers... done.
Model loaded in 73.5s (load weights from disk: 2.2s, create model: 1.0s, apply weights to model: 52.7s, apply half(): 14.7s, move model to device: 2.2s, calculate empty prompt: 0.5s).
100%|██████████████████████████████████████████████████████████████████████████████████████████| 25/25 [10:21<00:00, 24.86s/it]
=========================================================================================██████| 25/25 [09:57<00:00, 21.85s/it]
A tensor with all NaNs was produced in VAE.
Web UI will now convert VAE into 32-bit float and retry.
To disable this behavior, disable the 'Automaticlly revert VAE to 32-bit floats' setting.
To always start with 32-bit VAE, use --no-half-vae commandline flag.
=========================================================================================
Total progress: 100%|██████████████████████████████████████████████████████████████████████████| 25/25 [12:02<00:00, 28.90s/it]
100%|██████████████████████████████████████████████████████████████████████████████████████████| 25/25 [11:19<00:00, 27.20s/it]
*** Error completing request███████████████████████████████████████████████████████████████████| 25/25 [10:56<00:00, 28.00s/it]
Exception in thread MemMon:
*** Arguments: ('task(0su720x0ea0x3x7)', 'minimalistic bear shaman logo', '', [], 25, 16, False, False, 1, 1, 7, -1.0, -1.0, 0, 0, 0, False, 1024, 1024, False, 0.7, 2, 'Latent', 0, 0, 0, 0, '', '', [], <gradio.routes.Request object at 0x000001D66E0C5F60>, 0, True, False, False, False, 'Line Art', <scripts.controlnet_ui.controlnet_ui_group.UiControlNetUnit object at 0x000001D66E0C6860>, <scripts.controlnet_ui.controlnet_ui_group.UiControlNetUnit object at 0x000001D66E0C4070>, <scripts.controlnet_ui.controlnet_ui_group.UiControlNetUnit object at 0x000001D66E0C4430>, <scripts.controlnet_ui.controlnet_ui_group.UiControlNetUnit object at 0x000001D3A0A9FD00>, False, False, 'Matrix', 'Horizontal', 'Mask', 'Prompt', '1,1', '0.2', False, False, False, 'Attention', False, '0', '0', '0.4', None, '0', '0', False, False, 'positive', 'comma', 0, False, False, '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, 0, None, None, False, None, None, False, None, None, False, None, None, False, 50) {}
Traceback (most recent call last):
  File "D:\App\miniconda3\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "D:\Workspace\stable-diffusion-webui\modules\memmon.py", line 53, in run
    free, total = self.cuda_mem_get_info()
  File "D:\Workspace\stable-diffusion-webui\modules\memmon.py", line 34, in cuda_mem_get_info
    return torch.cuda.mem_get_info(index)
  File "D:\Workspace\stable-diffusion-webui\venv\lib\site-packages\torch\cuda\memory.py", line 618, in mem_get_info
    return torch.cuda.cudart().cudaMemGetInfo(device)
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

    Traceback (most recent call last):
      File "D:\Workspace\stable-diffusion-webui\modules\call_queue.py", line 58, in f
        res = list(func(*args, **kwargs))
      File "D:\Workspace\stable-diffusion-webui\modules\call_queue.py", line 37, in f
        res = func(*args, **kwargs)
      File "D:\Workspace\stable-diffusion-webui\modules\txt2img.py", line 62, in txt2img
        processed = processing.process_images(p)
      File "D:\Workspace\stable-diffusion-webui\modules\processing.py", line 673, in process_images
        res = process_images_inner(p)
      File "D:\Workspace\stable-diffusion-webui\extensions\sd-webui-controlnet\scripts\batch_hijack.py", line 42, in processing_process_images_hijack
        return getattr(processing, '__controlnet_original_process_images_inner')(p, *args, **kwargs)
      File "D:\Workspace\stable-diffusion-webui\modules\processing.py", line 795, in process_images_inner
        x_samples_ddim = decode_latent_batch(p.sd_model, samples_ddim, target_device=devices.cpu, check_for_nans=True)
      File "D:\Workspace\stable-diffusion-webui\modules\processing.py", line 545, in decode_latent_batch
        sample = decode_first_stage(model, batch[i:i + 1])[0]
      File "D:\Workspace\stable-diffusion-webui\modules\processing.py", line 576, in decode_first_stage
        x = model.decode_first_stage(x.to(devices.dtype_vae))
      File "D:\Workspace\stable-diffusion-webui\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
        return func(*args, **kwargs)
      File "D:\Workspace\stable-diffusion-webui\repositories\generative-models\sgm\models\diffusion.py", line 121, in decode_first_stage
        out = self.first_stage_model.decode(z)
      File "D:\Workspace\stable-diffusion-webui\repositories\generative-models\sgm\models\autoencoder.py", line 315, in decode
        dec = self.decoder(z, **decoder_kwargs)
      File "D:\Workspace\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "D:\Workspace\stable-diffusion-webui\repositories\generative-models\sgm\modules\diffusionmodules\model.py", line 728, in forward
        h = self.up[i_level].block[i_block](h, temb, **kwargs)
      File "D:\Workspace\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "D:\Workspace\stable-diffusion-webui\repositories\generative-models\sgm\modules\diffusionmodules\model.py", line 132, in forward
        h = self.conv1(h)
      File "D:\Workspace\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "D:\Workspace\stable-diffusion-webui\extensions-builtin\Lora\networks.py", line 371, in network_Conv2d_forward
        return torch.nn.Conv2d_forward_before_network(self, input)
      File "D:\Workspace\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\conv.py", line 463, in forward
        return self._conv_forward(input, self.weight, self.bias)
      File "D:\Workspace\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\conv.py", line 459, in _conv_forward
        return F.conv2d(input, weight, bias, self.stride,
    RuntimeError: CUDA error: out of memory
    CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
    For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
    Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

---
Traceback (most recent call last):
  File "D:\Workspace\stable-diffusion-webui\venv\lib\site-packages\gradio\routes.py", line 422, in run_predict
    output = await app.get_blocks().process_api(
  File "D:\Workspace\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 1323, in process_api
    result = await self.call_function(
  File "D:\Workspace\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 1051, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "D:\Workspace\stable-diffusion-webui\venv\lib\site-packages\anyio\to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "D:\Workspace\stable-diffusion-webui\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "D:\Workspace\stable-diffusion-webui\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "D:\Workspace\stable-diffusion-webui\modules\call_queue.py", line 93, in f
    mem_stats = {k: -(v//-(1024*1024)) for k, v in shared.mem_mon.stop().items()}
  File "D:\Workspace\stable-diffusion-webui\modules\memmon.py", line 92, in stop
    return self.read()
  File "D:\Workspace\stable-diffusion-webui\modules\memmon.py", line 77, in read
    free, total = self.cuda_mem_get_info()
  File "D:\Workspace\stable-diffusion-webui\modules\memmon.py", line 34, in cuda_mem_get_info
    return torch.cuda.mem_get_info(index)
  File "D:\Workspace\stable-diffusion-webui\venv\lib\site-packages\torch\cuda\memory.py", line 618, in mem_get_info
    return torch.cuda.cudart().cudaMemGetInfo(device)
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Traceback (most recent call last):
  File "D:\Workspace\stable-diffusion-webui\venv\lib\site-packages\gradio\routes.py", line 422, in run_predict
    output = await app.get_blocks().process_api(
  File "D:\Workspace\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 1323, in process_api
    result = await self.call_function(
  File "D:\Workspace\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 1051, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "D:\Workspace\stable-diffusion-webui\venv\lib\site-packages\anyio\to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "D:\Workspace\stable-diffusion-webui\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "D:\Workspace\stable-diffusion-webui\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "D:\Workspace\stable-diffusion-webui\modules\sd_models.py", line 608, in unload_model_weights
    model_data.sd_model.to(devices.cpu)
  File "D:\Workspace\stable-diffusion-webui\venv\lib\site-packages\lightning_fabric\utilities\device_dtype_mixin.py", line 54, in to
    return super().to(*args, **kwargs)
  File "D:\Workspace\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1145, in to
    return self._apply(convert)
  File "D:\Workspace\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
    module._apply(fn)
  File "D:\Workspace\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
    module._apply(fn)
  File "D:\Workspace\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
    module._apply(fn)
  [Previous line repeated 1 more time]
  File "D:\Workspace\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 820, in _apply
    param_applied = fn(param)
  File "D:\Workspace\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1143, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Additional information

I want to know how to deal with out of memory error when it happens, Is the only way I can restart the terminal? It will take a lot of time. I tried to reload UI and Unload SD checkpoint to free VRAM , but neither works.

catboxanon commented 1 year ago

Does this same issue occur on 1.6.0-RC?

MikeBivins commented 1 year ago

this is the case since i use automatic1111 also happens in sd1.5 and possibly on the new 1.6 RC aswell. -Generate an image with hires enabled. Doesn't matter how you do it we want to find the spot where gpu memory reached max. depending on gpu -let it run 2-3x lower resolution just a bit. After it goes insufficient memory twice or more, you will notice that gpu memory usage is higher than usual for no reason at all. -there appears to be memory leakage somewhere, also adressed by many . The only solution I go right now is know your gpu and dont let it go full power with no puffer. Otherwise you will be resetting much more than the gpu i tell you ahah also been there done that

spotlesscoder commented 11 months ago

@MikeBivins "Otherwise you will be resetting much more than the gpu" what do you mean by that?

ice-fly commented 10 months ago

See Optimizations ... Verify you have enabled resizable bar on your bios. Check taskmanager to diagnose.

AUTOMATIC1111 / stable-diffusion-webui