[Bug]: "CUDA out of memory" after update to the latest commit -> Stable diffusion model failed to load

Gourieff commented 1 year ago

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits

What happened?

Hi there I have RTX 3060 6Gb on my Laptop This GPU works great even with SDXL models (with some optimizations) Everything was fine before the today's update (2023-09-02) I use SD1.5 based model 5.3Gb, and till today there was no errors Today morning I ran a usual git pull to get the latest commits and after that SD WebUI is refusing to load the 5.3Gb model with a "CUDA out of memory" error

Version of NVIDIA driver - latest, 537.13

So! I've just tried to "git reset" to the commit e7965a5e - and all is fine now, the model loads with no errors

Please check the latest commit, smth went wrong there

Steps to reproduce the problem

Launch web-ui with 6Gb NVIDIA GPU and try to load any model more than 5Gb with webui-user.bat:

@echo off

set PYTHON=
set GIT=
set VENV_DIR=
set COMMANDLINE_ARGS=--opt-sdp-attention --upcast-sampling --api
set SAFETENSORS_FAST_GPU=1
set PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.6, max_split_size_mb:128
set CUDA_MODULE_LOADING=LAZY
set NUMEXPR_MAX_THREADS=16
set GRADIO_ANALYTICS_ENABLED=False
set CUDA_AUTO_BOOST=1

call webui.bat

What should have happened?

5.3Gb SD1.5 based model should have been loaded as before

Sysinfo

sysinfo-2023-09-02-08-42.txt

What browsers do you use to access the UI ?

Mozilla Firefox

Console logs

To create a public link, set `share=True` in `launch()`.
Startup time: 24.4s (prepare environment: 5.5s, import torch: 8.7s, import gradio: 1.4s, setup paths: 0.9s, initialize shared: 0.3s, other imports: 1.0s, setup codeformer: 0.2s, load scripts: 4.6s, create ui: 1.1s, gradio launch: 0.5s, add APIs: 0.2s).
Creating model from config: F:\SD\A1111\stable-diffusion-webui\configs\v1-inference.yaml
loading stable diffusion model: OutOfMemoryError
Traceback (most recent call last):
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\threading.py", line 973, in _bootstrap
    self._bootstrap_inner()
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "F:\SD\A1111\stable-diffusion-webui\modules\initialize.py", line 147, in load_model
    shared.sd_model  # noqa: B018
  File "F:\SD\A1111\stable-diffusion-webui\modules\shared_items.py", line 110, in sd_model
    return modules.sd_models.model_data.get_sd_model()
  File "F:\SD\A1111\stable-diffusion-webui\modules\sd_models.py", line 499, in get_sd_model
    load_model()
  File "F:\SD\A1111\stable-diffusion-webui\modules\sd_models.py", line 626, in load_model
    load_model_weights(sd_model, checkpoint_info, state_dict, timer)
  File "F:\SD\A1111\stable-diffusion-webui\modules\sd_models.py", line 353, in load_model_weights
    model.load_state_dict(state_dict, strict=False)
  File "F:\SD\A1111\stable-diffusion-webui\modules\sd_disable_initialization.py", line 223, in <lambda>
    module_load_state_dict = self.replace(torch.nn.Module, 'load_state_dict', lambda *args, **kwargs: load_state_dict(module_load_state_dict, *args, **kwargs))
  File "F:\SD\A1111\stable-diffusion-webui\modules\sd_disable_initialization.py", line 221, in load_state_dict
    original(module, state_dict, strict=strict)
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 2027, in load_state_dict
    load(self, state_dict)
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 2015, in load
    load(child, child_state_dict, child_prefix)
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 2015, in load
    load(child, child_state_dict, child_prefix)
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 2015, in load
    load(child, child_state_dict, child_prefix)
  [Previous line repeated 5 more times]
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 2009, in load
    module._load_from_state_dict(
  File "F:\SD\A1111\stable-diffusion-webui\modules\sd_disable_initialization.py", line 225, in <lambda>
    linear_load_from_state_dict = self.replace(torch.nn.Linear, '_load_from_state_dict', lambda *args, **kwargs: load_from_state_dict(linear_load_from_state_dict, *args, **kwargs))
  File "F:\SD\A1111\stable-diffusion-webui\modules\sd_disable_initialization.py", line 191, in load_from_state_dict
    module._parameters[name] = torch.nn.parameter.Parameter(torch.zeros_like(param, device=device, dtype=dtype), requires_grad=param.requires_grad)
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\torch\_meta_registrations.py", line 1780, in zeros_like
    return aten.empty_like.default(
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\torch\_ops.py", line 287, in __call__
    return self._op(*args, **kwargs or {})
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\torch\_refs\__init__.py", line 4254, in empty_like
    return torch.empty_strided(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 18.00 MiB (GPU 0; 6.00 GiB total capacity; 5.03 GiB already allocated; 0 bytes free; 5.25 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Stable diffusion model failed to load
Applying attention optimization: sdp... done.
Exception in thread Thread-24 (load_model):
Traceback (most recent call last):
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "F:\SD\A1111\stable-diffusion-webui\modules\initialize.py", line 153, in load_model
    devices.first_time_calculation()
  File "F:\SD\A1111\stable-diffusion-webui\modules\devices.py", line 152, in first_time_calculation
    conv2d(x)
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "F:\SD\A1111\stable-diffusion-webui\extensions\a1111-sd-webui-locon\scripts\..\..\..\extensions-builtin/Lora\networks.py", line 444, in network_Conv2d_forward
    return originals.Conv2d_forward(self, input)
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\conv.py", line 463, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\conv.py", line 459, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED
Loading weights [f89ec39ba0] from F:\SD\A1111\stable-diffusion-webui\models\Stable-diffusion\Realistic\+cyberrealistic_classicV17.safetensors
Creating model from config: F:\SD\A1111\stable-diffusion-webui\configs\v1-inference.yaml
loading stable diffusion model: OutOfMemoryError
Traceback (most recent call last):
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\threading.py", line 973, in _bootstrap
    self._bootstrap_inner()
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\gradio\utils.py", line 707, in wrapper
    response = f(*args, **kwargs)
  File "F:\SD\A1111\stable-diffusion-webui\modules\ui_extra_networks.py", line 392, in pages_html
    return refresh()
  File "F:\SD\A1111\stable-diffusion-webui\modules\ui_extra_networks.py", line 398, in refresh
    pg.refresh()
  File "F:\SD\A1111\stable-diffusion-webui\modules\ui_extra_networks_textual_inversion.py", line 13, in refresh
    sd_hijack.model_hijack.embedding_db.load_textual_inversion_embeddings(force_reload=True)
  File "F:\SD\A1111\stable-diffusion-webui\modules\textual_inversion\textual_inversion.py", line 255, in load_textual_inversion_embeddings
    self.expected_shape = self.get_expected_shape()
  File "F:\SD\A1111\stable-diffusion-webui\modules\textual_inversion\textual_inversion.py", line 154, in get_expected_shape
    vec = shared.sd_model.cond_stage_model.encode_embedding_init_text(",", 1)
  File "F:\SD\A1111\stable-diffusion-webui\modules\shared_items.py", line 110, in sd_model
    return modules.sd_models.model_data.get_sd_model()
  File "F:\SD\A1111\stable-diffusion-webui\modules\sd_models.py", line 499, in get_sd_model
    load_model()
  File "F:\SD\A1111\stable-diffusion-webui\modules\sd_models.py", line 626, in load_model
    load_model_weights(sd_model, checkpoint_info, state_dict, timer)
  File "F:\SD\A1111\stable-diffusion-webui\modules\sd_models.py", line 353, in load_model_weights
    model.load_state_dict(state_dict, strict=False)
  File "F:\SD\A1111\stable-diffusion-webui\modules\sd_disable_initialization.py", line 223, in <lambda>
    module_load_state_dict = self.replace(torch.nn.Module, 'load_state_dict', lambda *args, **kwargs: load_state_dict(module_load_state_dict, *args, **kwargs))
  File "F:\SD\A1111\stable-diffusion-webui\modules\sd_disable_initialization.py", line 221, in load_state_dict
    original(module, state_dict, strict=strict)
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 2027, in load_state_dict
    load(self, state_dict)
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 2015, in load
    load(child, child_state_dict, child_prefix)
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 2015, in load
    load(child, child_state_dict, child_prefix)
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 2015, in load
    load(child, child_state_dict, child_prefix)
  [Previous line repeated 5 more times]
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 2009, in load
    module._load_from_state_dict(
  File "F:\SD\A1111\stable-diffusion-webui\modules\sd_disable_initialization.py", line 225, in <lambda>
    linear_load_from_state_dict = self.replace(torch.nn.Linear, '_load_from_state_dict', lambda *args, **kwargs: load_from_state_dict(linear_load_from_state_dict, *args, **kwargs))
  File "F:\SD\A1111\stable-diffusion-webui\modules\sd_disable_initialization.py", line 191, in load_from_state_dict
    module._parameters[name] = torch.nn.parameter.Parameter(torch.zeros_like(param, device=device, dtype=dtype), requires_grad=param.requires_grad)
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\torch\_meta_registrations.py", line 1780, in zeros_like
    return aten.empty_like.default(
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\torch\_ops.py", line 287, in __call__
    return self._op(*args, **kwargs or {})
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\torch\_refs\__init__.py", line 4254, in empty_like
    return torch.empty_strided(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 18.00 MiB (GPU 0; 6.00 GiB total capacity; 5.02 GiB already allocated; 0 bytes free; 5.25 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Stable diffusion model failed to load
Loading weights [f89ec39ba0] from F:\SD\A1111\stable-diffusion-webui\models\Stable-diffusion\Realistic\+cyberrealistic_classicV17.safetensors
Traceback (most recent call last):
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\gradio\routes.py", line 488, in run_predict
    output = await app.get_blocks().process_api(
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 1431, in process_api
    result = await self.call_function(
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 1103, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\gradio\utils.py", line 707, in wrapper
    response = f(*args, **kwargs)
  File "F:\SD\A1111\stable-diffusion-webui\modules\ui_extra_networks.py", line 392, in pages_html
    return refresh()
  File "F:\SD\A1111\stable-diffusion-webui\modules\ui_extra_networks.py", line 398, in refresh
    pg.refresh()
  File "F:\SD\A1111\stable-diffusion-webui\modules\ui_extra_networks_textual_inversion.py", line 13, in refresh
    sd_hijack.model_hijack.embedding_db.load_textual_inversion_embeddings(force_reload=True)
  File "F:\SD\A1111\stable-diffusion-webui\modules\textual_inversion\textual_inversion.py", line 255, in load_textual_inversion_embeddings
    self.expected_shape = self.get_expected_shape()
  File "F:\SD\A1111\stable-diffusion-webui\modules\textual_inversion\textual_inversion.py", line 154, in get_expected_shape
    vec = shared.sd_model.cond_stage_model.encode_embedding_init_text(",", 1)
AttributeError: 'NoneType' object has no attribute 'cond_stage_model'
Creating model from config: F:\SD\A1111\stable-diffusion-webui\configs\v1-inference.yaml
loading stable diffusion model: OutOfMemoryError
Traceback (most recent call last):
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\threading.py", line 973, in _bootstrap
    self._bootstrap_inner()
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\gradio\utils.py", line 707, in wrapper
    response = f(*args, **kwargs)
  File "F:\SD\A1111\stable-diffusion-webui\modules\ui_extra_networks.py", line 392, in pages_html
    return refresh()
  File "F:\SD\A1111\stable-diffusion-webui\modules\ui_extra_networks.py", line 398, in refresh
    pg.refresh()
  File "F:\SD\A1111\stable-diffusion-webui\modules\ui_extra_networks_textual_inversion.py", line 13, in refresh
    sd_hijack.model_hijack.embedding_db.load_textual_inversion_embeddings(force_reload=True)
  File "F:\SD\A1111\stable-diffusion-webui\modules\textual_inversion\textual_inversion.py", line 255, in load_textual_inversion_embeddings
    self.expected_shape = self.get_expected_shape()
  File "F:\SD\A1111\stable-diffusion-webui\modules\textual_inversion\textual_inversion.py", line 154, in get_expected_shape
    vec = shared.sd_model.cond_stage_model.encode_embedding_init_text(",", 1)
  File "F:\SD\A1111\stable-diffusion-webui\modules\shared_items.py", line 110, in sd_model
    return modules.sd_models.model_data.get_sd_model()
  File "F:\SD\A1111\stable-diffusion-webui\modules\sd_models.py", line 499, in get_sd_model
    load_model()
  File "F:\SD\A1111\stable-diffusion-webui\modules\sd_models.py", line 626, in load_model
    load_model_weights(sd_model, checkpoint_info, state_dict, timer)
  File "F:\SD\A1111\stable-diffusion-webui\modules\sd_models.py", line 353, in load_model_weights
    model.load_state_dict(state_dict, strict=False)
  File "F:\SD\A1111\stable-diffusion-webui\modules\sd_disable_initialization.py", line 223, in <lambda>
    module_load_state_dict = self.replace(torch.nn.Module, 'load_state_dict', lambda *args, **kwargs: load_state_dict(module_load_state_dict, *args, **kwargs))
  File "F:\SD\A1111\stable-diffusion-webui\modules\sd_disable_initialization.py", line 221, in load_state_dict
    original(module, state_dict, strict=strict)
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 2027, in load_state_dict
    load(self, state_dict)
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 2015, in load
    load(child, child_state_dict, child_prefix)
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 2015, in load
    load(child, child_state_dict, child_prefix)
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 2015, in load
    load(child, child_state_dict, child_prefix)
  [Previous line repeated 5 more times]
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 2009, in load
    module._load_from_state_dict(
  File "F:\SD\A1111\stable-diffusion-webui\modules\sd_disable_initialization.py", line 225, in <lambda>
    linear_load_from_state_dict = self.replace(torch.nn.Linear, '_load_from_state_dict', lambda *args, **kwargs: load_from_state_dict(linear_load_from_state_dict, *args, **kwargs))
  File "F:\SD\A1111\stable-diffusion-webui\modules\sd_disable_initialization.py", line 191, in load_from_state_dict
    module._parameters[name] = torch.nn.parameter.Parameter(torch.zeros_like(param, device=device, dtype=dtype), requires_grad=param.requires_grad)
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\torch\_meta_registrations.py", line 1780, in zeros_like
    return aten.empty_like.default(
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\torch\_ops.py", line 287, in __call__
    return self._op(*args, **kwargs or {})
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\torch\_refs\__init__.py", line 4254, in empty_like
    return torch.empty_strided(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 18.00 MiB (GPU 0; 6.00 GiB total capacity; 5.02 GiB already allocated; 0 bytes free; 5.25 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Stable diffusion model failed to load
Loading weights [f89ec39ba0] from F:\SD\A1111\stable-diffusion-webui\models\Stable-diffusion\Realistic\+cyberrealistic_classicV17.safetensors
Traceback (most recent call last):
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\gradio\routes.py", line 488, in run_predict
    output = await app.get_blocks().process_api(
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 1431, in process_api
    result = await self.call_function(
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 1103, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\gradio\utils.py", line 707, in wrapper
    response = f(*args, **kwargs)
  File "F:\SD\A1111\stable-diffusion-webui\modules\ui_extra_networks.py", line 392, in pages_html
    return refresh()
  File "F:\SD\A1111\stable-diffusion-webui\modules\ui_extra_networks.py", line 398, in refresh
    pg.refresh()
  File "F:\SD\A1111\stable-diffusion-webui\modules\ui_extra_networks_textual_inversion.py", line 13, in refresh
    sd_hijack.model_hijack.embedding_db.load_textual_inversion_embeddings(force_reload=True)
  File "F:\SD\A1111\stable-diffusion-webui\modules\textual_inversion\textual_inversion.py", line 255, in load_textual_inversion_embeddings
    self.expected_shape = self.get_expected_shape()
  File "F:\SD\A1111\stable-diffusion-webui\modules\textual_inversion\textual_inversion.py", line 154, in get_expected_shape
    vec = shared.sd_model.cond_stage_model.encode_embedding_init_text(",", 1)
AttributeError: 'NoneType' object has no attribute 'cond_stage_model'
Creating model from config: F:\SD\A1111\stable-diffusion-webui\configs\v1-inference.yaml
loading stable diffusion model: OutOfMemoryError
Traceback (most recent call last):
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\threading.py", line 973, in _bootstrap
    self._bootstrap_inner()
  File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\gradio\utils.py", line 707, in wrapper
    response = f(*args, **kwargs)
  File "F:\SD\A1111\stable-diffusion-webui\modules\ui.py", line 1298, in <lambda>
    update_image_cfg_scale_visibility = lambda: gr.update(visible=shared.sd_model and shared.sd_model.cond_stage_key == "edit")
  File "F:\SD\A1111\stable-diffusion-webui\modules\shared_items.py", line 110, in sd_model
    return modules.sd_models.model_data.get_sd_model()
  File "F:\SD\A1111\stable-diffusion-webui\modules\sd_models.py", line 499, in get_sd_model
    load_model()
  File "F:\SD\A1111\stable-diffusion-webui\modules\sd_models.py", line 626, in load_model
    load_model_weights(sd_model, checkpoint_info, state_dict, timer)
  File "F:\SD\A1111\stable-diffusion-webui\modules\sd_models.py", line 353, in load_model_weights
    model.load_state_dict(state_dict, strict=False)
  File "F:\SD\A1111\stable-diffusion-webui\modules\sd_disable_initialization.py", line 223, in <lambda>
    module_load_state_dict = self.replace(torch.nn.Module, 'load_state_dict', lambda *args, **kwargs: load_state_dict(module_load_state_dict, *args, **kwargs))
  File "F:\SD\A1111\stable-diffusion-webui\modules\sd_disable_initialization.py", line 221, in load_state_dict
    original(module, state_dict, strict=strict)
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 2027, in load_state_dict
    load(self, state_dict)
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 2015, in load
    load(child, child_state_dict, child_prefix)
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 2015, in load
    load(child, child_state_dict, child_prefix)
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 2015, in load
    load(child, child_state_dict, child_prefix)
  [Previous line repeated 5 more times]
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 2009, in load
    module._load_from_state_dict(
  File "F:\SD\A1111\stable-diffusion-webui\modules\sd_disable_initialization.py", line 225, in <lambda>
    linear_load_from_state_dict = self.replace(torch.nn.Linear, '_load_from_state_dict', lambda *args, **kwargs: load_from_state_dict(linear_load_from_state_dict, *args, **kwargs))
  File "F:\SD\A1111\stable-diffusion-webui\modules\sd_disable_initialization.py", line 191, in load_from_state_dict
    module._parameters[name] = torch.nn.parameter.Parameter(torch.zeros_like(param, device=device, dtype=dtype), requires_grad=param.requires_grad)
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\torch\_meta_registrations.py", line 1780, in zeros_like
    return aten.empty_like.default(
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\torch\_ops.py", line 287, in __call__
    return self._op(*args, **kwargs or {})
  File "F:\SD\A1111\stable-diffusion-webui\venv\lib\site-packages\torch\_refs\__init__.py", line 4254, in empty_like
    return torch.empty_strided(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 18.00 MiB (GPU 0; 6.00 GiB total capacity; 5.02 GiB already allocated; 0 bytes free; 5.25 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Stable diffusion model failed to load

Additional information

No response

w-e-w commented 1 year ago

my guess is that you probably are already running very close to the edge and you just have one more stuff in the background that takes up the last straw of the vram you need

lots of stuff stuff such as web browser depends on how it's configured and if Hardware acceleration is on can use a some of your vram

also I would try using --medvram-sdxl https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Command-Line-Arguments-and-Settings

blackmoon96 commented 1 year ago

me too, after update i can't run, Stable diffusion model failed to load

Gourieff commented 1 year ago

my guess is that you probably are already running very close to the edge and you just have one more stuff in the background that takes up the last straw of the vram you need

lots of stuff stuff such as web browser depends on how it's configured and if Hardware acceleration is on can use a some of your vram

also I would try using --medvram-sdxl https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Command-Line-Arguments-and-Settings

Earlier even 8Gb models worked ok as well as SDXL models, so it's not an edge of this GPU or VRAM capacity, there's something wrong with latest updates after very stable 1.5.2 version

Also I noticed that the img generation process has become longer, it seems to be hanging on the last percents

A bunch of errors in the Console in different situations, e.g. when I stopping the server (ctrl+c) I get a lot of errors while its closing, 1.5.2. and previous versions never had anything like this

I've just made a clean install of 1.5.2, works perfectly, very fast and no errors

So guys, version 1.6.0 is still raw

Nerogante commented 1 year ago

My generations won't even start, it's hangs on 0% and does not progress. My fans start spinning like before but it's stuck at 0% forever

Acephalia commented 1 year ago

Having the same issue on my 3090 24GB with the latest pull. Could generate SDXL + Refiner without any issues but ever since the pull OOM-ing like crazy.

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 24.00 GiB total capacity; 10.66 GiB already allocated; 10.70 GiB free; 10.99 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

allkhor commented 1 year ago

Same here Nvidia 1060 with 6Gb:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 72.00 MiB (GPU 0; 5.92 GiB total capacity; 5.01 GiB already allocated; 77.75 MiB free; 5.25 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Back to 1.5.2

wagontrader commented 1 year ago

Earlier even 8Gb models worked ok as well as SDXL models, so it's not an edge of this GPU or VRAM capacity, there's something wrong with latest updates after very stable 1.5.2 version

Ran into this issue when I loaded the base and refiner checkpoints into VRAM. Checking the 'Only keep one model on device' setting fixed the issue.

Acephalia commented 1 year ago

Earlier even 8Gb models worked ok as well as SDXL models, so it's not an edge of this GPU or VRAM capacity, there's something wrong with latest updates after very stable 1.5.2 version

Ran into this issue when I loaded the base and refiner checkpoints into VRAM. Checking the 'Only keep one model on device' setting fixed the issue.

Can confirm, this works for XL. Not sure why I can't keep both models loaded given that I should have plenty of VRAM to spare with 24gb :/

ps : Now hangs quite a bit during the swap, image generation for 1024x1024 wildly fluctuates between 30secs to 2mins.

Gourieff commented 1 year ago

Earlier even 8Gb models worked ok as well as SDXL models, so it's not an edge of this GPU or VRAM capacity, there's something wrong with latest updates after very stable 1.5.2 version

Ran into this issue when I loaded the base and refiner checkpoints into VRAM. Checking the 'Only keep one model on device' setting fixed the issue.

Thanks, I will try But it's not right I think even for my 6Gb GPU, not to mention GPUs with more VRAM, especially 24Gb - it's nonsense!

wagontrader commented 1 year ago

Can confirm, this works for XL. Not sure why I can't keep both models loaded given that I should have plenty of VRAM to spare with 24gb :/

ps : Now hangs quite a bit during the swap, image generation for 1024x1024 wildly fluctuates between 30secs to 2mins.

I have 16GB VRAM, however only 8 is shared, so it does appear to depend on how much memory your GPU is set up to share.

Acephalia commented 1 year ago

Can confirm, this works for XL. Not sure why I can't keep both models loaded given that I should have plenty of VRAM to spare with 24gb :/ ps : Now hangs quite a bit during the swap, image generation for 1024x1024 wildly fluctuates between 30secs to 2mins.

I have 16GB VRAM, however only 8 is shared, so it does appear to depend on how much memory your GPU is set up to share.

@wagontrader the card is dedicated only for SD. I run the igpu for other tasks and my display. I was doing up some reading and the PyTorch allocated amount seems to be the vram set aside for SD to use and this would explain the OOM. Do you have any idea how I can let just let SD use the whole VRAM?

There seems to be a PyTorch setting for

torch.cuda.set_per_process_memory_fraction()

but I have no idea how or where to set it for A1111

Gourieff commented 1 year ago

Can confirm, this works for XL. Not sure why I can't keep both models loaded given that I should have plenty of VRAM to spare with 24gb :/ ps : Now hangs quite a bit during the swap, image generation for 1024x1024 wildly fluctuates between 30secs to 2mins.

I have 16GB VRAM, however only 8 is shared, so it does appear to depend on how much memory your GPU is set up to share.

@wagontrader the card is dedicated only for SD. I run the igpu for other tasks and my display. I was doing up some reading and the PyTorch allocated amount seems to be the vram set aside for SD to use and this would explain the OOM. Do you have any idea how I can let just let SD use the whole VRAM?

There seems to be a PyTorch setting for

torch.cuda.set_per_process_memory_fraction()

but I have no idea how or where to set it for A1111

You can see in the Windows task manager how much A1111 uses during a processing:

The screenshots above are from https://allthings.how/how-to-check-vram-usage-on-windows-10/

wagontrader commented 1 year ago

@wagontrader the card is dedicated only for SD. I run the igpu for other tasks and my display. I was doing up some reading and the PyTorch allocated amount seems to be the vram set aside for SD to use and this would explain the OOM. Do you have any idea how I can let just let SD use the whole VRAM?

There seems to be a PyTorch setting for

torch.cuda.set_per_process_memory_fraction()

but I have no idea how or where to set it for A1111

@Gourieff It does appear to use dedicated, not shared VRAM. As far as I know, the only way to change the amount dedicated with the GPU is through the BIOS. I will give it a try and see.

edit: mine is in an expansion slot, so I can't change how it was set up. As I go further down the rabbit hole, it appears that I only have 8GB of VRAM and all 8 are being used. I will run some more tests and see if I can get any more insight.

Acephalia commented 1 year ago

@wagontrader thank you. Appreciate the reply.

This does just further prove my point. Only about 12GB of ram is being used and it then giving me OOM errors as soon as it goes over that. There is 10GB at least just sitting there doing nothing. I use the 24gig card only for SD the system is using its on GPU. Which is why I can't quite understand what is going on.

Acephalia commented 1 year ago

Okay good news. I managed to fix my issue. Here are the steps if anyone else needs it:

Delete your venv folder.
Download and install : https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_522.06_windows.exe (If you have any other cuda toolkit version first uninstall all components.)
Run web-ui.bat to recompile the venv. Once it finishes close A1111.
Download https://developer.download.nvidia.com/compute/redist/cudnn/v8.7.0/local_installers/11.8/cudnn_8.7.0.84_windows.exe then open with winrar or the likes. Go to cudnn>libcudnn>bin and copy all of them to > stable-diffusion-webui\venv\Lib\site-packages\torch\lib and overwrite.
Done.

I can load both the refiner and checkpoint again on my 24gb now and the pytorch allocation scales as needed. 8-10 seconds to generate 1080x1080. Hope this helps anyone else who has been stuck with this too!

wagontrader commented 1 year ago

@Acephalia I restarted my PC so that I could take a look at the BIOS, and after the reboot I took another look to see if I could figure anything else out. I set up the UI to load 2 checkpoints and set them both to be stored in VRAM (unchecked the option to only keep one model on the device). Ran some tests using XL Base and XL Refiner and everything is working as it should now.

A simple reboot did the trick for me.

Gourieff commented 1 year ago

@Acephalia I restarted my PC so that I could take a look at the BIOS, and after the reboot I took another look to see if I could figure anything else out. I set up the UI to load 2 checkpoints and set them both to be stored in VRAM (unchecked the option to only keep one model on the device). Ran some tests using XL Base and XL Refiner and everything is working as it should now.

A simple reboot did the trick for me.

@wagontrader thanks my friend! A reboot helped somehow 🤔 Also I purged the old venv and got it reinstalled before that (maybe it helped as well) I even try to run A1111 (5.3Gb model) and ComfyUI Portable (4Gb model) at the same time - working 🤷‍♂️

aleph23 commented 11 months ago

@Acephalia What version of torch did you have installed? Specifically what I'm wondering is, did/does it have "+cu118" appended to the name -- like 'torch-2.0.1+cu118'.? If not, installing torch yourself via pip install torch==2.0.1+cu118 torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu118 might work better for you. The cuDNN files from Pytorch are supposed to be optimized for Torch. The have the same version number as NVidia's (6.14.11.11080) but different file size and much more recent build date.

SDwebui's built in torch install command doesn't append '+cu118' and so installs a slightly different version if installing from scratch.

lookbothways commented 11 months ago

I set up the UI to load 2 checkpoints and set them both to be stored in VRAM

Hi - how did you do this?

Acephalia commented 11 months ago

@aleph23 Can’t quite remember what version of torch I had but all my issues are now fixed. Here are the full steps for anyone else who needs them.

@lookbothways Webui > Settings > Stable Diffusion It’s the first slider.

AUTOMATIC1111 / stable-diffusion-webui