CompVis / stable-diffusion

A latent text-to-image diffusion model
https://ommer-lab.com/research/latent-diffusion-models/
Other
67.47k stars 10.07k forks source link

help ! RuntimeError: CUDA out of memory. Tried to allocate 1.50 GiB (GPU 0; 10.92 GiB total capacity; 8.62 GiB already allocated; 1.39 GiB free; 8.81 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF #86

Open chinatian opened 2 years ago

chinatian commented 2 years ago

error message:

RuntimeError: CUDA out of memory. Tried to allocate 1.50 GiB (GPU 0; 10.92 GiB total capacity; 8.62 GiB already allocated; 1.39 GiB free; 8.81 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

i have GeForce GTX 1080 Ti[11G]

F0rt1s commented 2 years ago

You ran out of GPU memory. Try using nvitop to monitor your gpu memory usage. You can try this branch: https://github.com/basujindal/stable-diffusion It trades speed for a lower memory usage.

Naxter commented 2 years ago

You can also just reduce the width and height of the output by using the parameters --H and --W

chyld commented 2 years ago

Also, try using --n_samples 1.

breadbrowser commented 2 years ago

just use this https://huggingface.co/spaces/stabilityai/stable-diffusion

titusfx commented 2 years ago

I've a Titan V 12Gb only worked with @chyld tip:

python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms --n_samples 1

grid-0002

Changing --H 256 --W 256 the result where poor grid-0000 grid-0001

xmvlad commented 2 years ago

Have same issue on GPU with 12GB VRAM. Just turned model to float16 precision. scripts/txt2img.py, function - load_model_from_config, line - 63, change from: model.cuda() to model.cuda().half()

JustinGuese commented 2 years ago

Have same issue on GPU with 12GB VRAM. Just turned model to float16 precision. scripts/txt2img.py, function - load_model_from_config, line - 63, change from: model.cuda() to model.cuda().half()

@xmvlad would you say the quality had been reduced a lot?

JustinGuese commented 2 years ago

And yes I basically would say you will at least need 12GB VRAM

JustinGuese commented 2 years ago

it helps removing the sfw filter as the model takes ~2GB VRAM just disable the lines or use my txt2img.py https://github.com/JustinGuese/stable-diffusor-docker-text2image/blob/master/txt2img.py

JustinGuese commented 2 years ago

you could also disable the watermark, but it does not use as much vram

xmvlad commented 2 years ago

would you say the quality had been reduced a lot?

No, results was almost the same(checked some prompts from web).

rjamesnw commented 2 years ago

I have the same issue on Windows 10:

RuntimeError: CUDA out of memory. Tried to allocate 1.50 GiB (GPU 0; 8.00 GiB total capacity; 5.62 GiB already allocated; 0 bytes free; 5.74 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

It's not a low memory issue, it's a NO memory issue (after a PC restart) because Pytorch is possibly taking too much. Any way to reduce what it allocates?

rjamesnw commented 2 years ago

Seems this post did help to reduce the total reserved size of Pytorch: https://github.com/CompVis/stable-diffusion/issues/86#issuecomment-1230309710

I think Windows is allocating some for itself (tried closing all apps and still over 3gb is already allocated), and using that post's solution helps, along with reducing the hight and width, and samples: https://github.com/CompVis/stable-diffusion/issues/86#issuecomment-1228617044

kiranscaria commented 2 years ago

Faced the same issues. Things that worked for me.

  1. Load the half-model as suggested by @xmvlad here.
  2. Disabling safety checker and invisible watermarking
  3. reducing number of samples to 1 (--n_samples 1)
  4. reducing the height and weight to 256. This severely affects the quality of the output.
phatal commented 2 years ago

loading half-model sucessfull fix error at txt2img. But when I try do the same at img2img I got new error:

RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same

looks like I need to do halving with something else...

lthiet commented 2 years ago

@phatal

Replace line 54 of scripts/img2img.py with

image = np.array(image).astype(np.float16) / 255.0

And also make sure that your input picture has a dimension of 512x512. Compression rate does not matter.

That worked for me.

rjamesnw commented 2 years ago

PyTorch is still taking a lot of memory, and it seems a lot of other GPU memory is taken up by something else while the command runs because the resource monitor shows very little utilized until the command runs. Is 8GB to low for a GPU for this system? I can only make 384x384 work at the most, but would like a higher res image if possible. I already implemented the ideas above (reduce samples, and half the model), but 512 fails:

> python scripts/txt2img.py --prompt "flying pig" --H 512 --W 512 --seed 27 --n_iter 1 --ddim_steps 100

RuntimeError: CUDA out of memory. Tried to allocate 3.00 GiB (GPU 0; 8.00 GiB total capacity; 3.65 GiB already allocated; 1.18 GiB free; 4.30 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

3.65 GiB already allocated, but it is not allocated before the command runs - especially after a restart and shutting down almost all processes using the GPU (except Windows obviously).

kiranscaria commented 2 years ago

@rjamesnw After using the half precision model, have the GPU consumption to peak to ~12-13GB. To lower the GPU consumption further you can refer Issue: #95 You can also look at repos targeting smaller VRAM footprint like: https://github.com/SkylarKelty/stable-diffusion

hkiang01 commented 2 years ago

This worked for me https://constant.meiring.nz/playing/2022/08/04/playing-with-stable-diffusion.html

dyanechi commented 2 years ago

I managed to get it to work with rtx 2060 that has only 6GB VRAM using lower resolution. Make sure to add model.to(torch.float16) in load_model_from_config function, just before model.cuda() is called. If it's still not enough, change the resolution. For me 384x384 works well but I also experimented with 256x768 and 320x704 which still produce good quality if you give it the right prompt.

09-09-2022_000931 seed-2688872 step-20 eta-0-35_768x256-0009 00214 00118

drfinkus commented 2 years ago

I'm getting this error for img2img on an RTX 3090 on Ubuntu.

RuntimeError: CUDA out of memory. Tried to allocate 26.11 GiB (GPU 0; 23.70 GiB total capacity; 4.31 GiB already allocated; 16.35 GiB free; 5.03 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Using --n_samples 1 did not help.

What worked for me was resizing the source image and converting it to png, as follows:

convert source.jpg -resize 512x512 source.png

Hope this helps someone.

JAsaxon commented 1 year ago

it helps removing the sfw filter as the model takes ~2GB VRAM just disable the lines or use my txt2img.py https://github.com/JustinGuese/stable-diffusor-docker-text2image/blob/master/txt2img.py

I wont ask why you conviniently have a txt2img with the sfw filter removed.....

rjamesnw commented 1 year ago

@dyanechi that worked for me. Needed to add --n_samples 1 (follow https://github.com/CompVis/stable-diffusion/issues/86#issuecomment-1236122289), but now I don't need to scale down, thanks.

mary5050 commented 1 year ago

RuntimeError: CUDA out of memory. Tried to allocate 2.15 GiB (GPU 0; 8.00 GiB total capacity; 6.26 GiB already allocated; 0 bytes free; 6.38 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF :( immpossible command!!!!

Bec-k commented 1 year ago

Same thing here, there is tons of available memory, but it keeps throwing that reserved memory is larger than allocated memory...

Florencia007 commented 1 year ago

Can you explain for dummies where and how I do this? thanks

Florencia007 commented 1 year ago

it helps removing the sfw filter as the model takes ~2GB VRAM just disable the lines or use my txt2img.py https://github.com/JustinGuese/stable-diffusor-docker-text2image/blob/master/txt2img.py

I don't knowwhere I put this file and what do I need to change after to load it. Can you explain it but for dummies like me. LOL

MiguelPunkUchi commented 1 year ago

@phatal

Replace line 54 of scripts/img2img.py with

image = np.array(image).astype(np.float16) / 255.0

And also make sure that your input picture has a dimension of 512x512. Compression rate does not matter.

That worked for me.

i dont know where to replace this line which file , i am new help plz

mary5050 commented 1 year ago

no, it is very small. I don't like that size 512

El dom, 19 feb 2023 a las 13:25, MiguelPunkUchi @.***>) escribió:

@phatal https://github.com/phatal

Replace line 54 of scripts/img2img.py with

image = np.array(image).astype(np.float16) / 255.0

And also make sure that your input picture has a dimension of 512x512. Compression rate does not matter.

That worked for me.

i dont know where to replace this line which file , i am new help plz

— Reply to this email directly, view it on GitHub https://github.com/CompVis/stable-diffusion/issues/86#issuecomment-1436031392, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALFUMKFPI7QF5KGXEBJV4S3WYJCO7ANCNFSM57RVRCCA . You are receiving this because you commented.Message ID: @.***>

nishchalkarwade commented 1 year ago

I am trying for NLP and getting similar error on my NVIDIA MX350

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 18.00 MiB (GPU 0; 2.00 GiB total capacity; 1.63 GiB already allocated; 0 bytes free; 1.71 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

windey1988 commented 1 year ago

try SET COMMANDLINE_ARGS= --lowvram --precision full --no-half or EXPORT COMMANDLINE_ARGS= --lowvram --precision full --no-half

dekanayake commented 1 year ago

Hi all, I have a Nvidia RTX 3060 GPU which has 12 GB VRAM, if I install additional Nvidia RTX 3060 can I fix this kind of VRAM memory limit issues. Do I need to connect them with NVidia NVLink ?

windey1988 commented 1 year ago

try SET COMMANDLINE_ARGS= --lowvram --precision full --no-half or EXPORT COMMANDLINE_ARGS= --lowvram --precision full --no-half

another suggestion is : set PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:1024

JustinGuese commented 1 year ago

I think currently it's not possible to split the model to multiple gpus. But like windey said try the low ram mode - will result in lower quality though


From: windey1988 @.> Sent: Monday, April 24, 2023 3:16:08 AM To: CompVis/stable-diffusion @.> Cc: Justin Güse @.>; Comment @.> Subject: Re: [CompVis/stable-diffusion] help ! RuntimeError: CUDA out of memory. Tried to allocate 1.50 GiB (GPU 0; 10.92 GiB total capacity; 8.62 GiB already allocated; 1.39 GiB free; 8.81 GiB reserved in total by PyTorch) If reserved memory is >> allocated me...

try SET COMMANDLINE_ARGS= --lowvram --precision full --no-half or EXPORT COMMANDLINE_ARGS= --lowvram --precision full --no-half

another suggestion is : set PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:1024

— Reply to this email directly, view it on GitHubhttps://github.com/CompVis/stable-diffusion/issues/86#issuecomment-1519238633, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACJFA22NPH5KXX24OQMSYD3XCXH5RANCNFSM57RVRCCA. You are receiving this because you commented.Message ID: @.***>

aqil-s commented 1 year ago

try SET COMMANDLINE_ARGS= --lowvram --precision full --no-half or EXPORT COMMANDLINE_ARGS= --lowvram --precision full --no-half

another suggestion is : set PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:1024

how do i set PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:1024 ?

windey1988 commented 1 year ago

try SET COMMANDLINE_ARGS= --lowvram --precision full --no-half or EXPORT COMMANDLINE_ARGS= --lowvram --precision full --no-half

another suggestion is : set PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:1024

how do i set PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:1024 ?

I ran it via project stable-diffusion-webui and set the environment variable in webui-macos-env.sh or webui-user.bat.

aqil-s commented 1 year ago

try SET COMMANDLINE_ARGS= --lowvram --precision full --no-half or EXPORT COMMANDLINE_ARGS= --lowvram --precision full --no-half

another suggestion is : set PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:1024

how do i set PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:1024 ?

I ran it via project stable-diffusion-webui and set the environment variable in webui-macos-env.sh or webui-user.bat.

you mean like editing the file in notepad? i cant find a line that says PYTORCH_CUDA_ALLOC_CONF

windey1988 commented 1 year ago

try SET COMMANDLINE_ARGS= --lowvram --precision full --no-half or EXPORT COMMANDLINE_ARGS= --lowvram --precision full --no-half

another suggestion is : set PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:1024

how do i set PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:1024 ?

I ran it via project stable-diffusion-webui and set the environment variable in webui-macos-env.sh or webui-user.bat.

you mean like editing the file in notepad? i cant find a line that says PYTORCH_CUDA_ALLOC_CONF

yep. If u run it in stable-diffusion-webui then u can edit the environment variable in webui-macos-env.sh or webui-user.bat. If no variable name like PYTORCH_CUDA_ALLOC_CONF u can add it into file.

aqil-s commented 1 year ago

try SET COMMANDLINE_ARGS= --lowvram --precision full --no-half or EXPORT COMMANDLINE_ARGS= --lowvram --precision full --no-half

another suggestion is : set PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:1024

how do i set PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:1024 ?

I ran it via project stable-diffusion-webui and set the environment variable in webui-macos-env.sh or webui-user.bat.

you mean like editing the file in notepad? i cant find a line that says PYTORCH_CUDA_ALLOC_CONF

yep. If u run it in stable-diffusion-webui then u can edit the environment variable in webui-macos-env.sh or webui-user.bat. If no variable name like PYTORCH_CUDA_ALLOC_CONF u can add it into file.

cool. thanks

LordBumThunder commented 1 year ago

Ok, so for clarity, I have built a pretty decent machine:

OS Name: Microsoft Windows 11 Enterprise System Manufacturer: Micro-Star International Co., Ltd. System Model: MPG B460 Trident AS (MS-B926) Processor: Intel(R) Core(TM) i7-10700F CPU @ 2.90GHz, 2901 Mhz, 8 Core(s), 16 Logical Processor(s) MotherBoard Product: MPG B460I GAMING EDGE (MS-7C86) Installed Physical Memory (RAM): 32.0 GB Total Virtual Memory: 67.9 GB Available Virtual Memory: 41.1 GB Graphics Card Name: MSI GeForce RTX 4070 VENTUS 3X 12G OC Adapter RAM: (1,048,576) bytes

But even with these components, I was still running into fucking CUDA errors.

Making these changes helped dramatically: Line 63: model.to(torch.float16) Line 64: model.cuda().half()

Also at the end of the FIRST section of my positive prompts (I say first section because you should break your prompts up into 5 sections), I always add "DLSS, Ray Tracing, uncensored, --n_samples 1" This does help a lot.

Now I'm able to create images with these settings: Sampling Method: DDIM Sampling Steps: 25 Restore Faces: Off Hires: On Upscale: 2.75 Denoising: 0.55 Batch Count: 10 Width: 404 Height: 652 CFG Scale: 11

I get that not everyone will have the capability to create images with these settings, but after making the changes above I have not run into any more CUDA errors, even when changing the to as high as 3

note: as I mentioned before, you should break your prompts up into 5 sections. If you want a better understanding of what I mean, check out this article. I explain it in detail: https://bit.ly/42kEgTp

JustinGuese commented 1 year ago

Hi @LordBumThunder, yeah with your setup it shouldn't be a problem. maybe reduce the batch size.

i can recommend using conda for pytorch setup as well, that worked pretty well for me. also ditch windows for anything ml related, or use wsl2, it has a nice gpu integration built in.

https://pytorch.org/ conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia

and then check if your GPU is recognized:

import torch
print(torch.cuda.mem_get_info())
JustinGuese commented 1 year ago

That is basically what the Automatic111 (https://github.com/AUTOMATIC1111/stable-diffusion-webui) version is doing as well, can recommend

LordBumThunder commented 1 year ago

Anytime I enter the line of code you gave me: "import torch print(torch.cuda.mem_get_info())"

It gives me this error: At line:2 char:31

Let me know if there is anything else I can do for you.

Justin Von Braun Creative Director

@.*** 929.266.6740 DO MORE https://www.behance.net/domoredesign. domoredesign.io [image: Behance] https://www.behance.net/domoredesign [image: Linkedin] https://www.linkedin.com/in/justinvonbraun/

On Tue, May 9, 2023 at 10:39 AM Justin Güse @.***> wrote:

That is basically what the Automatic111 ( https://github.com/AUTOMATIC1111/stable-diffusion-webui) version is doing as well, can recommend

— Reply to this email directly, view it on GitHub https://github.com/CompVis/stable-diffusion/issues/86#issuecomment-1540258211, or unsubscribe https://github.com/notifications/unsubscribe-auth/A6VHBTZ4AILCDRHJZXFKEETXFJJKDANCNFSM57RVRCCA . You are receiving this because you were mentioned.Message ID: @.***>

pvbang commented 10 months ago

I am using a 4GB GPU and the simple way to fix this error is to change to another ~2GB model. And in webui-user.bat, edit COMMANDLINE_ARGS to "set COMMANDLINE_ARGS= --xformers --autolaunch". It fixed the error for me. It uses about ~3.2GB GPUs when creating a 500x500 image, and ~3.6GB GPUs when creating a 720x1280 image.

I then tried a few ways to reduce GPU usage like the instructions above: