AUTOMATIC1111 / stable-diffusion-webui

Stable Diffusion web UI
GNU Affero General Public License v3.0
140.9k stars 26.66k forks source link

torch.cuda.OutOfMemoryError: CUDA out of memory (GTX 4090) [SOLVED] #13878

Open ericko777 opened 11 months ago

ericko777 commented 11 months ago

Is there an existing issue for this?

What happened?

Tried to generate an image TXT2IMG with Hires.fix X2 ....

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 426.00 MiB (GPU 0; 23.99 GiB total capacity; 4.43 GiB already allocated; 17.81 GiB free; 4.56 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

GTX 4090 ... 17.81GB Free... Tried to allocate 426.00 MB ... :-/

set COMMANDLINE_ARGS= --opt-sdp-attention --autolaunch --update-all-extensions --api

Steps to reproduce the problem

  1. Go to TXT2IMG
  2. Enter a prompt
  3. Activate Hires.fix and set it "Upscaled by 2"
  4. Generate

What should have happened?

Should work!

Sysinfo

app: stable-diffusion-webui.git updated: 2023-11-03 hash: 4afaaf8a url: https://github.com/AUTOMATIC1111/stable-diffusion-webui.git/tree/master arch: AMD64 cpu: Intel64 Family 6 Model 94 Stepping 3, GenuineIntel system: Windows release: Windows-10-10.0.22621-SP0 python: 3.10.6 2.0.0+cu118 autocast half device: NVIDIA GeForce RTX 4090 (1) (compute_37) (8, 9) cuda: 11.8 cudnn: 8800 driver: 545.84 ram: free:16.18 used:15.78 total:31.96 gpu: free:19.42 used:4.57 total:23.99 gpu-active: current:2.94 peak:4.53 gpu-allocated: current:2.94 peak:4.53 gpu-reserved: current:2.95 peak:4.56 gpu-inactive: current:0.01 peak:0.29 events: retries:2 oom:1 utilization: 0 xformers: 0.0.17 diffusers: 0.18.1 transformers: 4.27.4 active: cuda dtype: torch.float16 vae: torch.float16 unet: torch.float16 Memory optimization:None Cross-attention:sdp

What browsers do you use to access the UI ?

Google Chrome

Console logs

---
INFO:sd_dynamic_prompts.dynamic_prompting:Prompt matrix will create 8 images in a total of 1 batches.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:05<00:00,  3.64it/s]
  0%|                                                                                                              | 0/20 [00:00<?, ?it/s]
*** Error completing request
*** Arguments: ('task(l4ompe84qxkj6dy)', 'flower,', '', [], 20, 'Euler a', 1, 8, 7, 680, 512, True, 0.4, 2, 'Latent', 20, 0, 0, 'Use same checkpoint', 'Use same sampler', '', '', [], <gradio.routes.Request object at 0x00000206218748B0>, 0, False, '', 0.8, -1, False, -1, 0, 0, 0, True, False, 1, False, False, False, 1.1, 1.5, 100, 0.7, False, False, True, False, False, 0, 'Gustavosta/MagicPrompt-Stable-Diffusion', '', False, 'x264', 'blend', 10, 0, 0, False, True, True, True, 'intermediate', 'animation', False, False, 'LoRA', 'None', 1, 1, 'LoRA', 'None', 1, 1, 'LoRA', 'None', 1, 1, 'LoRA', 'None', 1, 1, 'LoRA', 'None', 1, 1, None, 'Refresh models', <scripts.controlnet_ui.controlnet_ui_group.UiControlNetUnit object at 0x0000020621877FA0>, <scripts.controlnet_ui.controlnet_ui_group.UiControlNetUnit object at 0x0000020621876200>, <scripts.controlnet_ui.controlnet_ui_group.UiControlNetUnit object at 0x0000020621993B80>, False, 'None', 20, None, False, '0', 'C:\\stable-diffusion-webui\\models\\roop\\inswapper_128.onnx', 'CodeFormer', 1, '', 1, 1, False, True, False, False, 'positive', 'comma', 0, False, False, '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, 0, False, 'Blur First V1', 0.25, 10, 10, 10, 10, 1, False, '', '', 0.5, 1, False, 5, 'all', 'all', 'all', '', '', '', '1', 'none', False, '', '', 'comma', '', True, '', '20', 'all', 'all', 'all', 'all', 0, '', 'Not set', True, True, '', '', '', '', '', 1.3, 'Not set', 'Not set', 1.3, 'Not set', 1.3, 'Not set', 1.3, 1.3, 'Not set', 1.3, 'Not set', 1.3, 'Not set', 1.3, 'Not set', 1.3, 'Not set', 1.3, 'Not set', False, 'None', None, None, False, None, None, False, None, None, False, 50, True) {}
    Traceback (most recent call last):
      File "C:\stable-diffusion-webui\modules\call_queue.py", line 57, in f
        res = list(func(*args, **kwargs))
      File "C:\stable-diffusion-webui\modules\call_queue.py", line 36, in f
        res = func(*args, **kwargs)
      File "C:\stable-diffusion-webui\modules\txt2img.py", line 55, in txt2img
        processed = processing.process_images(p)
      File "C:\stable-diffusion-webui\modules\processing.py", line 732, in process_images
        res = process_images_inner(p)
      File "C:\stable-diffusion-webui\extensions\sd-webui-controlnet\scripts\batch_hijack.py", line 42, in processing_process_images_hijack
        return getattr(processing, '__controlnet_original_process_images_inner')(p, *args, **kwargs)
      File "C:\stable-diffusion-webui\modules\processing.py", line 867, in process_images_inner
        samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts)
      File "C:\stable-diffusion-webui\modules\processing.py", line 1156, in sample
        return self.sample_hr_pass(samples, decoded_samples, seeds, subseeds, subseed_strength, prompts)
      File "C:\stable-diffusion-webui\modules\processing.py", line 1242, in sample_hr_pass
        samples = self.sampler.sample_img2img(self, samples, noise, self.hr_c, self.hr_uc, steps=self.hr_second_pass_steps or self.steps, image_conditioning=image_conditioning)
      File "C:\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 188, in sample_img2img
        samples = self.launch_sampling(t_enc + 1, lambda: self.func(self.model_wrap_cfg, xi, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
      File "C:\stable-diffusion-webui\modules\sd_samplers_common.py", line 261, in launch_sampling
        return func()
      File "C:\stable-diffusion-webui\modules\sd_samplers_kdiffusion.py", line 188, in <lambda>
        samples = self.launch_sampling(t_enc + 1, lambda: self.func(self.model_wrap_cfg, xi, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
      File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
        return func(*args, **kwargs)
      File "C:\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\sampling.py", line 145, in sample_euler_ancestral
        denoised = model(x, sigmas[i] * s_in, **extra_args)
      File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "C:\stable-diffusion-webui\modules\sd_samplers_cfg_denoiser.py", line 169, in forward
        x_out = self.inner_model(x_in, sigma_in, cond=make_condition_dict(cond_in, image_cond_in))
      File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "C:\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\external.py", line 112, in forward
        eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), **kwargs)
      File "C:\stable-diffusion-webui\repositories\k-diffusion\k_diffusion\external.py", line 138, in get_eps
        return self.inner_model.apply_model(*args, **kwargs)
      File "C:\stable-diffusion-webui\modules\sd_hijack_utils.py", line 17, in <lambda>
        setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs))
      File "C:\stable-diffusion-webui\modules\sd_hijack_utils.py", line 28, in __call__
        return self.__orig_func(*args, **kwargs)
      File "C:\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 858, in apply_model
        x_recon = self.model(x_noisy, t, **cond)
      File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "C:\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 1335, in forward
        out = self.diffusion_model(x, t, context=cc)
      File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "C:\stable-diffusion-webui\modules\sd_unet.py", line 91, in UNetModel_forward
        return ldm.modules.diffusionmodules.openaimodel.copy_of_UNetModel_forward_for_webui(self, x, timesteps, context, *args, **kwargs)
      File "C:\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\openaimodel.py", line 797, in forward
        h = module(h, emb, context)
      File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "C:\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\openaimodel.py", line 84, in forward
        x = layer(x, context)
      File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "C:\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\attention.py", line 334, in forward
        x = block(x, context=context[i])
      File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "C:\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\attention.py", line 269, in forward
        return checkpoint(self._forward, (x, context), self.parameters(), self.checkpoint)
      File "C:\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\util.py", line 121, in checkpoint
        return CheckpointFunction.apply(func, len(inputs), *args)
      File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\autograd\function.py", line 506, in apply
        return super().apply(*args, **kwargs)  # type: ignore[misc]
      File "C:\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\util.py", line 136, in forward
        output_tensors = ctx.run_function(*ctx.input_tensors)
      File "C:\stable-diffusion-webui\repositories\stable-diffusion-stability-ai\ldm\modules\attention.py", line 272, in _forward
        x = self.attn1(self.norm1(x), context=context if self.disable_self_attn else None) + x
      File "C:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "C:\stable-diffusion-webui\modules\sd_hijack_optimizations.py", line 533, in scaled_dot_product_attention_forward
        hidden_states = torch.nn.functional.scaled_dot_product_attention(
    torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 426.00 MiB (GPU 0; 23.99 GiB total capacity; 4.84 GiB already allocated; 17.50 GiB free; 4.87 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

---

Additional information

No response

ericko777 commented 11 months ago

image

Just realise, even if I have 24GB on my 4090GTX... only 8.5GB max is used. Someone know why?

Started from a clean version of Automatic1111... it wasnt like that.... then, when I copied extension from my other folder, it installed Torch 2.0.1... and other stuff... since, restricted to 8.5GB...hum!

ericko777 commented 11 months ago

Seem it doesnt crash anymore... even 3X in Hires.fix.... started from a clean install

and it use more the 8.5GB... 13.7GB at 3X upscaling

ericko777 commented 11 months ago

bad news this morning... still crashing... :-/

thegreenthumb007 commented 11 months ago

cuda: 11.8 cudnn: 8800 driver: 545.84

Cuda version is weird, You can try update your cuda version. Refer to: https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html Driver 545.84 must cooperate with cuda 12.3.x.

ericko777 commented 11 months ago

@thegreenthumb007 ... I now having cudnn: 8906 ... but cannot manage how to change cuda: 11.8 to 12

Even if I tried to install CUDA 12.3 ([cuda_12.3.0_545.84_windows.exe])(https://developer.download.nvidia.com/compute/cuda/12.3.0/local_installers/cuda_12.3.0_545.84_windows.exe)

but it still 11.8... I didnt restart, could it only be that?

device: NVIDIA GeForce RTX 4090 (1) (compute_37) (8, 9) cuda: 11.8 cudnn: 8906 driver: 545.84

I see some benchmark with "torch: 2.2.0.dev20231025+cu121"..... cuda: 12.1 cudnn: 8902 driver: 525.53 24GB

but I dont figure it out how to install it? Do you know?

ericko777 commented 11 months ago

finaly... install them on ..\AppData\Local\Programs\Python\Python310\Lib\site-packages and copy torch and torchvision to ..\webui\venv\Lib\site-packages

device: NVIDIA GeForce RTX 4090 (1) (sm_90) (8, 9) cuda: 12.1 cudnn: 8801 driver: 545.84

test time!

ericko777 commented 11 months ago

Same problem :-/

return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 680.00 MiB. GPU 0 has a total capacty of 23.99 GiB of which 16.57 GiB is free. Of the allocated memory 4.98 GiB is allocated by PyTorch, and 846.17 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

ericko777 commented 11 months ago

it work very well with batch of 8 images... but as soon as I use hires.fix... it cannot use all my 24GB...

INFO:sd_dynamic_prompts.dynamic_prompting:Prompt matrix will create 8 images in a total of 1 batches. 100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:05<00:00, 3.90it/s] Total progress: 100%|██████████████████████████████████████████████████████████████████| 40/40 [19:00<00:00, 28.52s/it] Total progress: 100%|██████████████████████████████████████████████████████████████████| 40/40 [19:00<00:00, 1.55it/s]

ericko777 commented 11 months ago

Working on this pilot... max Hires.fix X2

Pilote GeForce Game Ready - WHQL Version du pilote : 546.01 - Date de sortie : 2023 octobre 31

thegreenthumb007 commented 11 months ago

You can try two ways. 1. Check the CUDA environment variable 1699586895991

Update the cuDNN https://developer.nvidia.com/rdp/cudnn-archive

2. If still failed. For environment clean, I suggest you uninstall GPU,CUDA driver and delete all files. Then reinstall them, of course you can try reinstall some python package.

By the way, my Rtx4090 works will. you might something not install right.

ericko777 commented 11 months ago

I solved the problem by setting the system Virtual Memory to 60GB