CUDA out of memory though I still got a lot of free VRAM

PBoy20511 commented 1 year ago

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits

What happened?

Im running stable diffusion on my 4090, but this keeps showing up torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.12 GiB (GPU 0; 23.99 GiB total capacity; 4.82 GiB already allocated; 16.48 GiB free; 4.92 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

though I still have 16.48 GiB free, it keep showing cuda out of memory. Do anyone know what the problem is?

Steps to reproduce the problem

Go to ....
Press ....
...

What should have happened?

It shouldn’t have cuda out of memory

Commit where the problem happens

Not sure

What platforms do you use to access the UI ?

No response

What browsers do you use to access the UI ?

No response

Command Line Arguments

I use website-user.bat

List of extensions

Lora

Console logs

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.12 GiB (GPU 0; 23.99 GiB total capacity; 4.82 GiB already allocated; 16.48 GiB free; 4.92 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Additional information

No response

Trung0246 commented 1 year ago

What did you tried to do? Textual Inversion training? Normal image generation?

crappypatty commented 1 year ago

try setting this as a system variable on Environmental Variables: Variable name= PYTHON_CUDA_ALLOC_CONF Variable value= max_split_size_mb:1024

50mkw commented 1 year ago

I have same problem

50mkw commented 1 year ago

try setting this as a system variable on Environmental Variables: Variable name= PYTHON_CUDA_ALLOC_CONF Variable value= max_split_size_mb:1024

it's not working for this case, seems some where system limite pytorch to reserved more memory.

crappypatty commented 1 year ago

try setting this as a system variable on Environmental Variables: Variable name= PYTHON_CUDA_ALLOC_CONF Variable value= max_split_size_mb:1024

it's not working for this case, seems some where system limite pytorch to reserved more memory.

I miswrote the variable name lol. It is actually PYTORCH_CUDA_ALLOC_CONF. I didn't notice it.

tankwyn commented 1 year ago

Check the output of nvidia-smi, see if there's any zombie process and kill them if so. This usually happens when a client is still requesting something while the server was shutdown.

minkhantDoctoral commented 1 year ago

I got the similar error. My GPU is NVIDIA GeForce RTX 3050 8Gb (7.6Gb Available. 4Gb for Dedicated GPU and 3.6 Gb for shared GPU). When I run the stable diffusion model pretrained pipe line, all of memory are utilized and no memory error occured. But when I run it from step by step, I got out of memory error in the step: unet.to(torch_device).

I referenced the notebook from https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_diffusion.ipynb

Below is the version of modules I used !pip install diffusers==0.10.0 !pip install huggingface-hub>=0.11.1 !pip install transformers==4.25.1 !pip install ftfy==6.1.1 !pip install accelerate==0.15.0

My cuda version is 12.1 and detail info is below

AUTOMATIC1111 / stable-diffusion-webui