SusungHong / Self-Attention-Guidance

The implementation of the paper "Improving Sample Quality of Diffusion Models Using Self-Attention Guidance" (ICCV`23)
MIT License
105 stars 14 forks source link

Self-Attention Diffusion Guidance (ICCV`23)

image This is the implementation of the paper Improving Sample Quality of Diffusion Models Using Self-Attention Guidance by Hong et al. To gain insight from our exploration of the self-attention maps of diffusion models and for detailed explanations, please see our Paper and Project Page.

This repository is based on openai/guided-diffusion, and we modified feature extraction code from yandex-research/ddpm-segmentation to get the self-attention maps. The major implementation of our method is in ./guided_diffusion/gaussian_diffusion.py and ./guided_diffusion/unet.py.

All you need is to setup the environment, download existing models, and sample from them using our implementation. Neither further training nor a dataset is needed to apply self-attention guidance!

Updates

2023-08-14: This repository supports DDIM sampling with SAG.

2023-02-19: The Gradio Demo:hugs: of SAG for Stable Diffusion is now available

2023-02-16: The Stable Diffusion pipeline of SAG is now available at huggingface/diffusers :hugs::firecracker:

2023-02-01: The demo for Stable Diffusion is now available in Colab.

Environment

Downloading Pretrained Diffusion Models (and Classifiers for CG)

Pretrained weights for ImageNet and LSUN can be downloaded from the repository. Download and place them in the ./models/ directory.

Sampling from Pretrained Diffusion Models

You can sample from pretrained diffusion models with self-attention guidance by changing SAG_FLAGS in the following commands. Note that sampling with --guide_scale 1.0 means sampling without self-attention guidance. Below are the 4 examples.

Results

Compatibility of self-attention guidance (SAG) and classifier guidance (CG) on ImageNet 128x128 model:

SAG CG FID sFID Precision Recall
5.91 5.09 0.70 0.65
V 2.97 5.09 0.78 0.59
V 5.11 4.09 0.72 0.65
V V 2.58 4.35 0.79 0.59

Results on pretrained models:

Model # of steps Self-attention guidance scale FID sFID IS Precision Recall
ImageNet 256×256 (Uncond.) 250 0.0 (baseline)
0.5
0.8
26.21
20.31
20.08
6.35
5.09
5.77
39.70
45.30
45.56
0.61
0.66
0.68
0.63
0.61
0.59
ImageNet 256×256 (Cond.) 250 0.0 (baseline)
0.2
10.94
9.41
6.02
5.28
100.98
104.79
0.69
0.70
0.63
0.62
LSUN Cat 256×256 250 0.0 (baseline)
0.05
7.03
6.87
8.24
8.21
-
-
0.60
0.60
0.53
0.50
LSUN Horse 256×256 250 0.0 (baseline)
0.01
3.45
3.43
7.55
7.51
-
-
0.68
0.68
0.56
0.55

Cite as

@inproceedings{hong2023improving,
  title={Improving sample quality of diffusion models using self-attention guidance},
  author={Hong, Susung and Lee, Gyuseong and Jang, Wooseok and Kim, Seungryong},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={7462--7471},
  year={2023}
}