Weilun Wang, Jianmin Bao, Wengang Zhou, Dongdong Chen, Dong Chen, Lu Yuan, Houqiang Li,
We provide our PyTorch implementation of Semantic Image Synthesis via Diffusion Models (SDM). In this paper, we propose a novel framework based on DDPM for semantic image synthesis. Unlike previous conditional diffusion model directly feeds the semantic layout and noisy image as input to a U-Net structure, which may not fully leverage the information in the input semantic mask, our framework processes semantic layout and noisy image differently. It feeds noisy image to the encoder of the U-Net structure while the semantic layout to the decoder by multi-layer spatially-adaptive normalization operators. To further improve the generation quality and semantic interpretability in semantic image synthesis, we introduce the classifier-free guidance sampling strategy, which acknowledge the scores of an unconditional model for sampling process. Extensive experiments on three benchmark datasets demonstrate the effectiveness of our proposed method, achieving state-of-the-art performance in terms of fidelity (FID) and diversity (LPIPS).
The Cityscapes and ADE20K dataset can be downloaded and prepared following SPADE. The CelebAMask-HQ can be downloaded from CelebAMask-HQ, you need to to integrate the separated annotations into an image file (the format like other datasets, e.g. Cityscapes and ADE20K).
Download the dataset.
Train the SDM model:
export OPENAI_LOGDIR='OUTPUT/ADE20K-SDM-256CH'
mpiexec -n 8 python image_train.py --data_dir ./data/ade20k --dataset_mode ade20k --lr 1e-4 --batch_size 4 --attention_resolutions 32,16,8 --diffusion_steps 1000 \
--image_size 256 --learn_sigma True --noise_schedule linear --num_channels 256 --num_head_channels 64 --num_res_blocks 2 \
--resblock_updown True --use_fp16 True --use_scale_shift_norm True --use_checkpoint True --num_classes 151 \
--class_cond True --no_instance True
Fine-tune the SDM model:
export OPENAI_LOGDIR='OUTPUT/ADE20K-SDM-256CH-FINETUNE'
mpiexec -n 8 python image_train.py --data_dir ./data/ade20k --dataset_mode ade20k --lr 2e-5 --batch_size 4 --attention_resolutions 32,16,8 --diffusion_steps 1000 \
--image_size 256 --learn_sigma True --noise_schedule linear --num_channels 256 --num_head_channels 64 --num_res_blocks 2 \
--resblock_updown True --use_fp16 True --use_scale_shift_norm True --use_checkpoint True --num_classes 151 --class_cond True \
--no_instance True --drop_rate 0.2 --resume_checkpoint OUTPUT/ADE20K-SDM-256CH/model.pt
Test the SDM model:
mpiexec -n 8 python image_sample.py --data_dir ./data/ade20k --dataset_mode ade20k --attention_resolutions 32,16,8 --diffusion_steps 1000 \
--image_size 256 --learn_sigma True --noise_schedule linear --num_channels 256 --num_head_channels 64 \
--num_res_blocks 2 --resblock_updown True --use_fp16 True --use_scale_shift_norm True --num_classes 151 \
--class_cond True --no_instance True --batch_size 2 --num_samples 2000 --s 1.5 \
--model_path OUTPUT/ADE20K-SDM-256CH-FINETUNE/ema_0.9999_best.pt --results_path RESULTS/ADE20K-SDM-256CH
Please refer to the 'scripts/ade20.sh' for more details.
Dataset | Download link |
---|---|
Cityscapes | Visual results |
ADE20K | Checkpoint | Visual results |
CelebAMask-HQ | Checkpoint | Visual results |
COCO-Stuff | Checkpoint | Visual results |
To evaluate the model (e.g., ADE20K), first generate the test results:
mpiexec -n 8 python image_sample.py --data_dir ./data/ade20k --dataset_mode ade20k --attention_resolutions 32,16,8 --diffusion_steps 1000 \
--image_size 256 --learn_sigma True --noise_schedule linear --num_channels 256 --num_head_channels 64 \
--num_res_blocks 2 --resblock_updown True --use_fp16 True --use_scale_shift_norm True --num_classes 151 \
--class_cond True --no_instance True --batch_size 2 --num_samples 2000 --s 1.5 \
--model_path OUTPUT/ADE20K-SDM-256CH-FINETUNE/ema_0.9999_best.pt --results_path RESULTS/ADE20K-SDM-256CH
To calucate FID metric, you should update "path1" and "path2" in "evaluations/test_with_FID.py" and run:
python evaluations/test_with_FID.py
To calcuate LPIPS, you should evaluate the model for 10 times and run:
python evaluations/lpips.py GENERATED_IMAGES_DIR
Our code is developed based on guided-diffusion. We also thank "test_with_FID.py" in OASIS for FID computation, "lpips.py" in stargan-v2 for LPIPS computation.