CFGpp-diffusion / CFGpp

Official repository for "CFG++: manifold-constrained classifier free guidance for diffusion models"
99 stars 2 forks source link
diffusion-model diffusionmodel image-editing machinelearning pytorch text-to-image text-to-image-generation

CFG++ : MANIFOLD-CONSTRAINED CLASSIFIER FREE GUIDANCE FOR DIFFUSION MODELS

main figure

Project Website arXiv


Summary

Classifier-free guidance (CFG) is a fundamental tool in modern diffusion models for text-guided generation. Although effective, CFG requires high guidance scales, which has notable drawbacks:

  1. Mode collapse and saturation
  2. Poor invertibility
  3. Unnatural, curved PF-ODE trajectory

We propose a simple fix to this seemingly inherent limitation and propose CFG++, which corrects the off-manifold problem of CFG. The following advantages are observed

  1. Small guidance scale $\lambda \in$ [0, 1] can be used with a similar effect as $\omega \in$ [1.0, 12.5] in CFG
  2. Better sample quality and better adherence to text
  3. Smooth, straighter PF-ODE trajectory
  4. Enhanced invertibility

Experimental results confirm that our method significantly enhances performance in text-to-image generation, DDIM inversion, editing, and solving inverse problems, suggesting a wide-ranging impact and potential applications in various fields that utilize text guidance.

Setup

First, create your environment. We recommend using the following comments.

git clone https://github.com/CFGpp-diffusion/CFGpp.git
cd CFGpp
conda env create -f environment.yaml

For reproducibility, using the same package version is necessary since some dependencies lead to significant differences (for instance, diffusers). Nonetheless, improvement induced by CFG++ will be observed regardless the dependency.

If you run one of the below examples, diffusers will automatically download checkpoints for SDv1.5 or SDXL.

Examples

Text-to-Image generation

Image Inversion

[!tip] If you want to use SDXL, add --model sdxl.

Callback

We provide callback functionality to monitor intermediate samples during the diffusion reverse process. For now, the function could be called only at the end of each timestep, for the readability of scripts.

Currently, we provide two options (default: None).

Note that using callback may take more time due to file save. You can refer utils/callback_util.py for details.

Citation

If you find our method useful, please cite as below or leave a star to this repository.

@article{chung2024cfg++,
  title={CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models},
  author={Chung, Hyungjin and Kim, Jeongsol and Park, Geon Yeong and Nam, Hyelin and Ye, Jong Chul},
  journal={arXiv preprint arXiv:2406.08070},
  year={2024}
}

[!note] This work is currently in the preprint stage, and there may be some changes to the code.