RenYurui / Global-Flow-Local-Attention

The source code for paper "Deep Image Spatial Transformation for Person Image Generation"
https://renyurui.github.io/GFLA-web
Other
568 stars 84 forks source link
cvpr2020 spatial-transformer-network

Website | ArXiv | Get Start

Global-Flow-Local-Attention

The source code for our paper "Deep Image Spatial Transformation for Person Image Generation" (CVPR2020)

We propose a Global-Flow Local-Attention Model for deep image spatial transformation. Our model can be flexibly applied to tasks such as:

Left: generated results of our model; Right: Input source images.

Left most: Skeleton Squences. The others: Animation Results.

Left: Input image; Right: Output results.

Form Left to Right: Input image, Results of Appearance Flow, Results of Ours, Ground-truth images.

News

Colab Demo

For a quick exploration of our model, find the online colab demo.

Get Start

1) Installation

Requirements

Conda installation

# 1. Create a conda virtual environment.
conda create -n gfla python=3.6 -y
source activate gfla

# 2. Install dependency
pip install -r requirement.txt

# 3. Build pytorch Custom CUDA Extensions
./setup.sh

Note: The current code is tested with Tesla V100. If you use a different GPU, you may need to select correct nvcc_args for your GPU when you buil Custom CUDA Extensions. Comment or Uncomment --gencode in block_extractor/setup.py, local_attn_reshape/setup.py, and resample2d_package/setup.py. Please check here for details.

2) Download Resources

We provide the pre-trained weights of our model. The resources are listed as following:

Download the Per-Trained Models and the Demo Images by running the following code:

./download.sh

3) Pose-Guided Person Image Generation

The Pose-Guided Person Image Generation task is to transfer a source person image to a target pose.

Run the demo of this task:

python demo.py \
--name=pose_fashion_checkpoints \
--model=pose \
--attn_layer=2,3 \
--kernel_size=2=5,3=3 \
--gpu_id=0 \
--dataset_mode=fashion \
--dataroot=./dataset/fashion \
--results_dir=./demo_results/fashion

For more training and testing details, please find the PERSON_IMAGE_GENERATION.md

4) Pose-Guided Person Image Animation

The Pose-Guided Person Image Animation task generates a video clip from a still source image according to a driving target sequence. We further model the temporal consistency for this task.

Run the the demo of this task:

python demo.py \
--name=dance_fashion_checkpoints \
--model=dance \
--attn_layer=2,3 \
--kernel_size=2=5,3=3 \
--gpu_id=0 \
--dataset_mode=dance \
--sub_dataset=fashion \
--dataroot=./dataset/danceFashion \
--results_dir=./demo_results/dance_fashion \
--test_list=val_list.csv

For more training and testing details, please find the PERSON_IMAGE_ANIMATION.md.

5) Face Image Animation

Given an input source image and a guidance video sequence depicting the structure movements, our model generating a video containing the specific movements.

Run the the demo of this task:

python demo.py \
--name=face_checkpoints \
--model=face \
--attn_layer=2,3 \
--kernel_size=2=5,3=3 \
--gpu_id=0 \
--dataset_mode=face \
--dataroot=./dataset/FaceForensics \
--results_dir=./demo_results/face 

We use the real video of the FaceForensics dataset. See FACE_IMAGE_ANIMATION.md for more details.

6) Novel View Synthesis

View synthesis requires generating novel views of objects or scenes based on arbitrary input views.

In this task, we use the car and chair categories of the ShapeNet dataset. See VIEW_SYNTHESIS.md for more details.

Citation

@article{ren2020deep,
  title={Deep Image Spatial Transformation for Person Image Generation},
  author={Ren, Yurui and Yu, Xiaoming and Chen, Junming and Li, Thomas H and Li, Ge},
  journal={arXiv preprint arXiv:2003.00696},
  year={2020}
}

@article{ren2020deep,
  title={Deep Spatial Transformation for Pose-Guided Person Image Generation and Animation},
  author={Ren, Yurui and Li, Ge and Liu, Shan and Li, Thomas H},
  journal={IEEE Transactions on Image Processing},
  year={2020},
  publisher={IEEE}
}

Acknowledgement

We build our project base on Vid2Vid. Some dataset preprocessing methods are derived from Pose-Transfer.