Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis
Yanzuo Lu, Manlin Zhang, Andy J Ma, Xiaohua Xie, Jian-Huang Lai
IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR), June 17-21, 2024, Seattle, USA
If you want to cite and compare with out method, please download the generated images from Google Drive here. (Including 256x176, 512x352 on DeepFashion, and 128x64 on Market-1501)
conda env create -f environment.yaml
./fashion
directory. (Password would be required, please contact the authors of DeepFashion (not us!!!) for permission.)./fashion
directory../fashion
directory is as follows.
fashion
βββ fashion-resize-annotation-test.csv
βββ fashion-resize-annotation-train.csv
βββ fashion-resize-pairs-test.csv
βββ fashion-resize-pairs-train.csv
βββ MEN
βββ test.lst
βββ train.lst
βββ WOMEN
generate_fashion_datasets.py
with python.Download the following pre-trained models on demand, put them under ./pretrained_models directory. |
Model | Official Repository | Publicly Available |
---|---|---|---|
U-Net | runwayml/stable-diffusion-v1-5 | diffusion_pytorch_model.safetensors | |
VAE | runwayml/stable-diffusion-v1-5 | diffusion_pytorch_model.safetensors | |
Swin-B | microsoft/Swin-Transformer | swin_base_patch4_window12_384_22kto1k.pth | |
CLIP (ablation only) | openai/clip-vit-large-patch14 | model.satetensors |
./pretrained_models
directory is as follows.
pretrained_models
βββ clip
βΒ Β βββ config.json
βΒ Β βββ model.safetensors
βββ scheduler
βΒ Β βββ scheduler_config.json
βββ swin
βΒ Β βββ swin_base_patch4_window12_384_22kto1k.pth
βββ unet
βΒ Β βββ config.json
βΒ Β βββ diffusion_pytorch_model.safetensors
βββ vae
βββ config.json
βββ diffusion_pytorch_model.safetensors
For multi-gpu, run the following command by default.
bash scripts/multi_gpu/pose_transfer_train.sh 0,1,2,3,4,5,6,7
For single-gpu, run the following command by default.
bash scripts/single_gpu/pose_transfer_train.sh 0
For ablation studies, run the following command by example to specify configs.
bash scripts/multi_gpu/pose_transfer_train.sh 0,1,2,3,4,5,6,7 --config_file configs/ablation_study/no_app.yaml
For multi-gpu, run the following command by example to specify checkpoints.
bash scripts/multi_gpu/pose_transfer_test.sh 0,1,2,3,4,5,6,7 MODEL.PRETRAINED_PATH checkpoints
For single-gpu, run the following command by example to specify checkpoints.
bash scripts/single_gpu/pose_transfer_test.sh 0 MODEL.PRETRAINED_PATH checkpoints
@inproceedings{lu2024coarse,
title={Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis},
author={Lu, Yanzuo and Zhang, Manlin and Ma, Andy J and Xie, Xiaohua and Lai, Jian-Huang},
booktitle={CVPR},
year={2024}
}