CompVis / stable-diffusion

A latent text-to-image diffusion model
https://ommer-lab.com/research/latent-diffusion-models/
Other
67.41k stars 10.08k forks source link

OOM with Tesla T4 and small images #54

Open oconnoob opened 2 years ago

oconnoob commented 2 years ago

Hi there,

I'm running into OOM error using a Tesla T4 on GCP Ubuntu 18.04 using the v1.4 checkpoint. I see Global seed set to 42 and Loading model from sd-v1-4.ckpt followed by killed, even when I tried making the images 128x128 and then 32x32 and finally 2x2.

Any tips for figuring out what the issue is would be greatly appreciated!

smoke2007 commented 2 years ago

I got it somewhat working on a rtx 3060ti (8GB), also got OOM errors , reduced to 256x256 and it continued, but the output looks more like something dalle-mini would produce, not what I got from the web interface dreamstudio and in the beta discord.

leszekhanusz commented 2 years ago

How much RAM do you have on your motherboard? Did you try to create a swapfile on disk?

oconnoob commented 2 years ago

@leszekhanusz There are 15 GB RAM on the motherboard for that VM. I haven't tried creating a swapfile yet - cutting the size all the way down to 32 x 32 and still getting OOM is unexpected to me

oconnoob commented 2 years ago

@smoke2007 Interesting - thanks for the comparison. I'm trying on a K80 with 30 GB motherboard RAM. Seems to be working - will update when/if I get results.

leszekhanusz commented 2 years ago

You could try the float16 version available in diffusers v0.2.4

It works for me with 512x512 images with a RTX3080 10GB VRAM and 32GB RAM on CPU

smoke2007 commented 2 years ago

You could try the float16 version available in diffusers v0.2.4

It works for me with 512x512 images with a RTX3080 10GB VRAM and 32GB RAM on CPU

I'm not really using anything from hugginface aside from the checkpoint file. I'm using the code from this github with conda

smoke2007 commented 2 years ago

I just redeployed everything using this fork https://github.com/lstein/stable-diffusion

and now it works fine, with 512x512 and the output is a lot more what I would expect it to be.

Utopiah commented 2 years ago

Thanks @smoke2007 I was stuck in OOM realm too but this fork solved it. (using the v1.4 checkpoints from HuggingFace for CompVis, not diffusers).

oconnoob commented 2 years ago

@leszekhanusz I'm not using diffusers either - is there any way to use the float16 version with just the source and a checkpoint?

oconnoob commented 2 years ago

@smoke2007 is there any difference between dream.py in that fork and txt2imgpy in the original?

smoke2007 commented 2 years ago

@smoke2007 is there any difference between dream.py in that fork and txt2imgpy in the original?

yes the code is quite different, the safety filter seems off , some verbose things are surpressed and it optimizes some things for memory.

It also gives you an interactive prompt that understands the parameters as the discord bot did

oconnoob commented 2 years ago

@smoke2007 there's an orig_scripts directory that preserves the originals it seems, but it looks like there are extra imports required. Would love to get a fix in the original repo when possible, but I'll try out this fork in the meantime!

breadbrowser commented 2 years ago

just use this https://huggingface.co/spaces/stabilityai/stable-diffusion

patrickvonplaten commented 2 years ago

@oconnoob,

this gets you down to 5-6GB of VRAM:

# !pip install diffusers
from torch import autocast
from diffusers import StableDiffusionPipeline
import torch

pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", use_auth_token=True, revision="fp16", torch_device=torch.float16)

# remove VAE encoder as it's not needed
del pipe.vae.encoder

# now move to GPU which should not consume more than 5GB
pipe.to("cuda")

prompt = "a photo of an astronaut riding a horse on mars"
with autocast("cuda"):
    image = pipe(prompt)["sample"][0]