Huage001 / LinFusion

Official PyTorch and Diffusers Implementation of "LinFusion: 1 GPU, 1 Minute, 16K Image"
Apache License 2.0
246 stars 17 forks source link
# LinFusion arXiv Home Page

LinFusion: 1 GPU, 1 Minute, 16K Image
Songhua Liu, Weuhao Yu, Zhenxiong Tan, and Xinchao Wang
Learning and Vision Lab, National University of Singapore

🔥News

[2024/11/24] LinFusion is supported by triton implementation, which is even much more efficient than previous naive one! We would like to extend sincere gratitude to @hp-133 for the amazing work!

[2024/09/28] We release evaluation codes on the COCO benchmark!

[2024/09/27] We successfully integrate LinFusion to DistriFusion, an effective and efficient strategy for generating an image in parallel, and achieve more significant acceleration! Please refer to the example here!

[2024/09/26] We enable 16K image generation with merely 24G video memory! Please refer to the example here!

[2024/09/20] We release a more advanced pipeline for ultra-high-resolution image generation using SD-XL! It can be used for text-to-image generation and image super-resolution!

[2024/09/20] We release training codes for Stable Diffusion XL here!

[2024/09/13] We release LinFusion models for Stable Diffusion v-2.1 and Stable Diffusion XL!

[2024/09/13] We release training codes for Stable Diffusion v-1.5, v-2.1, and their variants here!

[2024/09/08] We release codes for 16K image generation here!

[2024/09/05] Gradio demo for SD-v1.5 is released! Text-to-image, image-to-image, and IP-Adapter are supported currently.

Supported Models

  1. Yuanshi/LinFusion-1-5: For Stable Diffusion v-1.5 and its variants.
  2. Yuanshi/LinFusion-2-1: For Stable Diffusion v-2.1 and its variants.
  3. Yuanshi/LinFusion-XL: For Stable Diffusion XL and its variants.

Quick Start

Gradio Demo

Ultrahigh-Resolution Generation

From the perspective of efficiency, our method supports high-resolution generation such as 16K images. Nevertheless, directly applying diffusion models trained on low resolutions for higher-resolution generation can result in content distortion and duplication. To tackle this challenge, we apply following techniques:

Training

Evaluation

Following GigaGAN, we use 30,000 COCO captions to generate 30,000 images for evaluation. FID against COCO val2014 is reported as a metric, and CLIP text cosine similarity is used to reflect the text-image alignment.

ToDo

Acknowledgement

Citation

If you finds this repo is helpful, please consider citing:

@article{liu2024linfusion,
  title     = {LinFusion: 1 GPU, 1 Minute, 16K Image},
  author    = {Liu, Songhua and Yu, Weihao and Tan, Zhenxiong and Wang, Xinchao},
  year      = {2024},
  eprint    = {2409.02097},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}