CUDA out of memory with 12GBVRAM

Trying to start with stable-diffusuon Was testing two machines :

Nvidia Titan X 12GB ram
Laptop Nvidfia RTX A500 4GB ram

In both cases, I am getting "CUDA out of memory" when trying to run a test example from GitHub website: python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms

I also tried with some internet suggestions, lowering resolution "--H 512 --W 512", adding "--n_samples 1" , adding "torch.cuda.empty_cache()" inside txt2img.py

and setting set PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:1024 or 512 etc.

but nothing works. is it even possible to run it with 12GB VRAM ?

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A   2997735      G   /usr/libexec/Xorg                            79MiB |
|    0   N/A  N/A   2997781      G   /usr/bin/gnome-shell                         11MiB |
+---------------------------------------------------------------------------------------+
Wed Oct  4 11:55:26 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.98                 Driver Version: 535.98       CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce GTX TITAN X     Off | 00000000:03:00.0 Off |                  N/A |
| 22%   53C    P8              15W / 250W |     98MiB / 12288MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A   2997735      G   /usr/libexec/Xorg                            79MiB |
|    0   N/A  N/A   2997781      G   /usr/bin/gnome-shell                         11MiB |
+---------------------------------------------------------------------------------------+

Traceback (most recent call last): File "scripts/txt2img.py", line 357, in <module> main() File "scripts/txt2img.py", line 308, in main samples_ddim, _ = sampler.sample(S=opt.ddim_steps, File "/home/ats/.conda/envs/ldm/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/mnt/Develop/TensorFlow/stable-diffusion/ldm/models/diffusion/plms.py", line 97, in sample samples, intermediates = self.plms_sampling(conditioning, size, File "/home/ats/.conda/envs/ldm/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/mnt/Develop/TensorFlow/stable-diffusion/ldm/models/diffusion/plms.py", line 152, in plms_sampling outs = self.p_sample_plms(img, cond, ts, index=index, use_original_steps=ddim_use_original_steps, File "/home/ats/.conda/envs/ldm/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/mnt/Develop/TensorFlow/stable-diffusion/ldm/models/diffusion/plms.py", line 218, in p_sample_plms e_t = get_model_output(x, t) File "/mnt/Develop/TensorFlow/stable-diffusion/ldm/models/diffusion/plms.py", line 185, in get_model_output e_t_uncond, e_t = self.model.apply_model(x_in, t_in, c_in).chunk(2) File "/mnt/Develop/TensorFlow/stable-diffusion/ldm/models/diffusion/ddpm.py", line 987, in apply_model x_recon = self.model(x_noisy, t, **cond) File "/home/ats/.conda/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/mnt/Develop/TensorFlow/stable-diffusion/ldm/models/diffusion/ddpm.py", line 1410, in forward out = self.diffusion_model(x, t, context=cc) File "/home/ats/.conda/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/mnt/Develop/TensorFlow/stable-diffusion/ldm/modules/diffusionmodules/openaimodel.py", line 737, in forward h = module(h, emb, context) File "/home/ats/.conda/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/mnt/Develop/TensorFlow/stable-diffusion/ldm/modules/diffusionmodules/openaimodel.py", line 85, in forward x = layer(x, context) File "/home/ats/.conda/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/mnt/Develop/TensorFlow/stable-diffusion/ldm/modules/attention.py", line 258, in forward x = block(x, context=context) File "/home/ats/.conda/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/mnt/Develop/TensorFlow/stable-diffusion/ldm/modules/attention.py", line 209, in forward return checkpoint(self._forward, (x, context), self.parameters(), self.checkpoint) File "/mnt/Develop/TensorFlow/stable-diffusion/ldm/modules/diffusionmodules/util.py", line 114, in checkpoint return CheckpointFunction.apply(func, len(inputs), *args) File "/mnt/Develop/TensorFlow/stable-diffusion/ldm/modules/diffusionmodules/util.py", line 127, in forward output_tensors = ctx.run_function(*ctx.input_tensors) File "/mnt/Develop/TensorFlow/stable-diffusion/ldm/modules/attention.py", line 212, in _forward x = self.attn1(self.norm1(x)) + x File "/home/ats/.conda/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/mnt/Develop/TensorFlow/stable-diffusion/ldm/modules/attention.py", line 189, in forward attn = sim.softmax(dim=-1) RuntimeError: CUDA out of memory. Tried to allocate 3.00 GiB (GPU 0; 11.92 GiB total capacity; 7.28 GiB already allocated; 866.00 MiB free; 10.37 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

CompVis / stable-diffusion

CUDA out of memory with 12GBVRAM #801