Person Image Synthesis via Denoising Diffusion Model
ArXiv
|
Project
|
Demo
|
Youtube
## News
- **2023.02** A demo available through Google Colab:
:rocket:
[Demo on Colab](https://colab.research.google.com/github/ankanbhunia/PIDM/blob/main/PIDM_demo.ipynb)
## Generated Results
You can directly download our test results from Google Drive: (1) [PIDM.zip](https://drive.google.com/file/d/1zcyTF37UrOmUqtRwwq1kgkyxnNX3oaQN/view?usp=share_link) (2) [PIDM_vs_Others.zip](https://drive.google.com/file/d/1iu75RVQBjR-TbB4ZQUns1oalzYZdNqGS/view?usp=share_link)
The [PIDM_vs_Others.zip](https://drive.google.com/file/d/1iu75RVQBjR-TbB4ZQUns1oalzYZdNqGS/view?usp=share_link) file compares our method with several state-of-the-art methods e.g. ADGAN [14], PISE [24], GFLA [20], DPTN [25], CASD [29],
NTED [19]. Each row contains target_pose, source_image, ground_truth, ADGAN, PISE, GFLA, DPTN, CASD, NTED, and PIDM (ours) respectively.
## Dataset
- Download `img_highres.zip` of the DeepFashion Dataset from [In-shop Clothes Retrieval Benchmark](https://drive.google.com/drive/folders/0B7EVK8r0v71pYkd5TzBiclMzR00).
- Unzip `img_highres.zip`. You will need to ask for password from the [dataset maintainers](http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion/InShopRetrieval.html). Then rename the obtained folder as **img** and put it under the `./dataset/deepfashion` directory.
- We split the train/test set following [GFLA](https://github.com/RenYurui/Global-Flow-Local-Attention). Several images with significant occlusions are removed from the training set. Download the train/test pairs and the keypoints `pose.zip` extracted with [Openpose](https://github.com/CMU-Perceptual-Computing-Lab/openpose) by downloading the following files:
- Download the train/test pairs from [Google Drive](https://drive.google.com/drive/folders/1PhnaFNg9zxMZM-ccJAzLIt2iqWFRzXSw?usp=sharing) including **train_pairs.txt**, **test_pairs.txt**, **train.lst**, **test.lst**. Put these files under the `./dataset/deepfashion` directory.
- Download the keypoints `pose.rar` extracted with Openpose from [Google Driven](https://drive.google.com/file/d/1waNzq-deGBKATXMU9JzMDWdGsF4YkcW_/view?usp=sharing). Unzip and put the obtained floder under the `./dataset/deepfashion` directory.
- Run the following code to save images to lmdb dataset.
```bash
python data/prepare_data.py \
--root ./dataset/deepfashion \
--out ./dataset/deepfashion
```
## Custom Dataset
The folder structure of any custom dataset should be as follows:
- dataset/
- - /
- - - img/
- - - pose/
- - - train_pairs.txt
- - - test_pairs.txt
You basically will have all your images inside ```img``` folder. You can use different subfolders to store your images or put all your images inside the ```img``` folder as well. The corresponding poses are stored inside ```pose``` folder (as txt file if you use openpose. In our project, we use 18-point keypoint estimation). ```train_pairs.txt``` and ```test_pairs.txt``` will have paths of all possible pairs seperated by comma ```,```.
After that, run the following command to process the data:
```
python data/prepare_data.py \
--root ./dataset/ \
--out ./dataset/
--sizes ((256,256),)
```
This will create an lmdb dataset ```./dataset//256-256/```
## Conda Installation
``` bash
# 1. Create a conda virtual environment.
conda create -n PIDM python=3.7
conda activate PIDM
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
# 2. Clone the Repo and Install dependencies
git clone https://github.com/ankanbhunia/PIDM
pip install -r requirements.txt
```
## Method
## Training
This code supports multi-GPU training. Full training takes 5 days with 8 A100 GPUs and a batch size 8 on the DeepFashion dataset. The model is trained for 300 epochs; however, it generates high-quality usable samples after 200 epochs. We also attempted training with V100 GPUs, and our code takes a similar amount of time for training.
```bash
python -m torch.distributed.launch --nproc_per_node=8 --master_port 48949 train.py \
--dataset_path "./dataset/deepfashion" --batch_size 8 --exp_name "pidm_deepfashion"
```
## Inference
Download the pretrained model from [here](https://drive.google.com/file/d/1WkV5Pn-_fBdiZlvVHHx_S97YESBkx4lD/view?usp=share_link) and place it in the ```checkpoints``` folder.
For pose control use ```obj.predict_pose``` as in the following code snippets.
```python
from predict import Predictor
obj = Predictor()
obj.predict_pose(image=, sample_algorithm='ddim', num_poses=4, nsteps=50)
```
For apperance control use ```obj.predict_appearance```
```python
from predict import Predictor
obj = Predictor()
src =
ref_img =
ref_mask =
ref_pose =
obj.predict_appearance(image=src, ref_img = ref_img, ref_mask = ref_mask, ref_pose = ref_pose, sample_algorithm = 'ddim', nsteps = 50)
```
The output will be saved as ```output.png``` filename.
## Citation
If you use the results and code for your research, please cite our paper:
```
@article{bhunia2022pidm,
title={Person Image Synthesis via Denoising Diffusion Model},
author={Bhunia, Ankan Kumar and Khan, Salman and Cholakkal, Hisham and Anwer, Rao Muhammad and Laaksonen, Jorma and Shah, Mubarak and Khan, Fahad Shahbaz},
journal={CVPR},
year={2023}
}
```
[Ankan Kumar Bhunia](https://scholar.google.com/citations?user=2leAc3AAAAAJ&hl=en),
[Salman Khan](https://scholar.google.com/citations?user=M59O9lkAAAAJ&hl=en),
[Hisham Cholakkal](https://scholar.google.com/citations?user=bZ3YBRcAAAAJ&hl=en),
[Rao Anwer](https://scholar.google.fi/citations?user=_KlvMVoAAAAJ&hl=en),
[Jorma Laaksonen](https://scholar.google.com/citations?user=qQP6WXIAAAAJ&hl=en),
[Mubarak Shah](https://scholar.google.com/citations?user=p8gsO3gAAAAJ&hl=en) &
[Fahad Khan](https://scholar.google.ch/citations?user=zvaeYnUAAAAJ&hl=en&oi=ao)