This repo is based on the official Stable Diffusion repo and its variants, enabling running stable-diffusion on GPU with only 1GB VRAM.
To reduce the VRAM usage, the following opimizations are used:
Establish a virtual environment and install dependencies as referred to the official repo. The quantized model checkpoint can be downloaded from Google drive
Only txt2img is supported now.
txt2img
can generate _512x512 images from a prompt using under 1GB GPU VRAM (evaluated with pytorch2.0 on RTX3090).
For example, the following command will generate 10 512x512 images:
python3 tiny_optimizedSD/tiny_txt2img.py --prompt "A peaceful lakeside cabin with a dock, surrounded by tall pine trees and a clear blue sky" --H 512 --W 512 --seed 27
--seed
Seed for image generation, can be used to reproduce previously generated images. Defaults to a random seed if unspecified.
The code will give the seed number along with each generated image. To generate the same image again, just specify the seed using --seed
argument. Images are saved with its seed number as its name by default.
For example if the seed number for an image is 1234
and it's the 55th image in the folder, the image name will be named seed_1234_00055.png
.
--n_samples
Batch size/amount of images to generate at once.
To get the lowest inference time per image, use the maximum batch size --n_samples
that can fit on the GPU. Inference time per image will reduce on increasing the batch size, but the required VRAM will increase.
If you get a CUDA out of memory error, try reducing the batch size --n_samples
. If it doesn't work, the other option is to reduce the image width --W
or height --H
or both.
--n_iter
Run x amount of times
n_samples
, reducing it doesn't have an effect on VRAM required or inference time.--H
& --W
Height & width of the generated image.
--turbo
Increases inference speed at the cost of extra VRAM usage.
--precision autocast
or --precision full
Whether to use full
or mixed
precision
--precision full
argument to disable it.--format png
or --format jpg
Output image format
png
. While png
is lossless, it takes up a lot of space (unless large portions of the image happen to be a single colour). Use lossy jpg
to get smaller image file sizes.--unet_bs
Batch size for the unet model
Takes up a lot of extra RAM for very little improvement in inference time. unet_bs
> 1 is not recommended!
Should generally be a multiple of 2x(n_samples)
Prompts can also be weighted to put relative emphasis on certain words.
eg. --prompt tabby cat:0.25 white duck:0.75 hybrid
.
The number followed by the colon represents the weight given to the words before the colon. The weights can be both fractions or integers.
--precision full
argument. The downside is that it will lead to higher GPU VRAM usage.