LinFusion: 1 GPU, 1 Minute, 16K Image
Songhua Liu, Weuhao Yu, Zhenxiong Tan, and Xinchao Wang
Learning and Vision Lab, National University of Singapore
[2024/11/24] LinFusion is supported by triton implementation, which is even much more efficient than previous naive one! We would like to extend sincere gratitude to @hp-133 for the amazing work!
[2024/09/28] We release evaluation codes on the COCO benchmark!
[2024/09/27] We successfully integrate LinFusion to DistriFusion, an effective and efficient strategy for generating an image in parallel, and achieve more significant acceleration! Please refer to the example here!
[2024/09/26] We enable 16K image generation with merely 24G video memory! Please refer to the example here!
[2024/09/20] We release a more advanced pipeline for ultra-high-resolution image generation using SD-XL! It can be used for text-to-image generation and image super-resolution!
[2024/09/20] We release training codes for Stable Diffusion XL here!
[2024/09/13] We release LinFusion models for Stable Diffusion v-2.1 and Stable Diffusion XL!
[2024/09/13] We release training codes for Stable Diffusion v-1.5, v-2.1, and their variants here!
[2024/09/08] We release codes for 16K image generation here!
[2024/09/05] Gradio demo for SD-v1.5 is released! Text-to-image, image-to-image, and IP-Adapter are supported currently.
Yuanshi/LinFusion-1-5
: For Stable Diffusion v-1.5 and its variants. Yuanshi/LinFusion-2-1
: For Stable Diffusion v-2.1 and its variants. Yuanshi/LinFusion-XL
: For Stable Diffusion XL and its variants. Clone this repo to your project directory:
git clone https://github.com/Huage001/LinFusion.git
You only need two lines!
from diffusers import AutoPipelineForText2Image
import torch
+ from src.linfusion import LinFusion
sd_repo = "Lykon/dreamshaper-8"
pipeline = AutoPipelineForText2Image.from_pretrained(
sd_repo, torch_dtype=torch.float16, variant="fp16"
).to(torch.device("cuda"))
+ linfusion = LinFusion.construct_for(pipeline)
image = pipeline(
"An astronaut floating in space. Beautiful view of the stars and the universe in the background.",
generator=torch.manual_seed(123)
).images[0]
LinFusion.construct_for(pipeline)
will return a LinFusion model that matches the pipeline's structure. And this LinFusion model will automatically mount to the pipeline's forward function.
examples/inference/basic_usage.ipynb
shows a basic text-to-image example.
From the perspective of efficiency, our method supports high-resolution generation such as 16K images. Nevertheless, directly applying diffusion models trained on low resolutions for higher-resolution generation can result in content distortion and duplication. To tackle this challenge, we apply following techniques:
SDEdit. The basic idea is to generate a low-resolution result at first, based on which we gradually upscale the image.
Please refer to examples/inference/ultra_text2image_w_sdedit.ipynb
for an example.
DemoFusion. It also generates high-resolution images from low-resolution results. Latents of the low-resolution generation are reused for high-resolution generation. Dilated convolutions are involved. Compared with the original version:
Please refer to examples/inference/ultra_text2image_sdxl.ipynb
for an example of high-resolution text-to-image generation (first generate 1024 resolution, then generate 2048, 4096, 8192, etc) and examples/inference/superres_sdxl.ipynb
for an example of image super resolution (directly upscale to the target resolution, generally 2x is recommended and using it multiple times if you want higher scales).
Above codes for 16K image generation require a GPU with 80G video memory. If you encounter OOM issues, you may consider examples/inference/superres_sdxl_low_w_mem.ipynb
, which requires only 24G video memory. We achieve this by 1) chunked forward of classifier-free guidance inference, 2) chunked forward of feed-forward network in Transformer blocks, 3) in-placed activation functions in ResNets, and 4) caching UNet residuals on CPU.
DistriFusion. Alternatively, if you have multiple GPU cards, you can try integrating LinFusion to DistriFusion, which achieves more significant acceleration due to its linearity and thus almost constant communication cost. You can run an minimal example with:
torchrun --nproc_per_node=$N_GPUS -m examples.inference.sdxl_distrifusion_example
We are working on integrating LinFusion with more advanced approaches that are dedicated on high-resolution extension! Feel free to create pull requests if you come up with better solutions!
Before training, make sure you have the packages shown in requirements.txt
installed:
pip install -r requirements.txt
Training codes for Stable Diffusion v-1.5, v-2.1, and their variants are released in src/train/distill.py
. We present an exampler running script in examples/train/distill.sh
. You can run it on a 8-GPU machine via:
bash ./examples/training/distill.sh
The codes will download bhargavsdesai/laion_improved_aesthetics_6.5plus_with_images
dataset automatically to ~/.cache
directory by default if there is not, which contains 169k images and requires ~75 GB disk space.
We use fp16 precision and 512 resolution for Stable Diffusion v-1.5 and bf16 precision and 768 resolution for Stable Diffusion v-2.1.
Training codes for Stable Diffusion XL are released in src/train/distill_xl.py
. We present an exampler running script in examples/train/distill_xl.sh
. You can run it on a 8-GPU machine via:
bash ./examples/training/distill_xl.sh
Following GigaGAN, we use 30,000 COCO captions to generate 30,000 images for evaluation. FID against COCO val2014 is reported as a metric, and CLIP text cosine similarity is used to reflect the text-image alignment.
To evaluate LinFusion, first install the required packages:
pip install git+https://github.com/openai/CLIP.git
pip install click clean-fid open_clip_torch
Download and unzip COCO val2014 to /path/to/coco
:
wget http://images.cocodataset.org/zips/val2014.zip
unzip val2014.zip -d /path/to/coco
Run examples/eval/eval.sh
to generate images for evaluation. You may need to specify outdir
, repo_id
, resolution
, etc.
bash examples/eval/eval.sh
Run examples/eval/calculate_metrics.sh
to calculate the metrics. You may need to specify /path/to/coco
, fake_dir
, etc.
bash examples/eval/calculate_metrics.sh
If you finds this repo is helpful, please consider citing:
@article{liu2024linfusion,
title = {LinFusion: 1 GPU, 1 Minute, 16K Image},
author = {Liu, Songhua and Yu, Weihao and Tan, Zhenxiong and Wang, Xinchao},
year = {2024},
eprint = {2409.02097},
archivePrefix={arXiv},
primaryClass={cs.CV}
}