MoritzKappel / HF-NHMT

Official PyTorch Implementation of 'High-Fidelity Neural Human Motion Transfer from Monocular Video'
MIT License
85 stars 11 forks source link
computer-vision machine-learning motion-transfer

High-Fidelity Neural Human Motion Transfer from Monocular Video

Python OS License: MIT

Website | Collaborators | ArXiv | Video

Official PyTorch implementation of our HF-NHMT method described in our paper 'High-Fidelity Neural Human Motion Transfer from Monocular Video'.

Prerequisites

Dataset

You can download our training and test sequences from here.

Getting Started

Installation

Data Preprocessing

Our method operates on monocular video footage, using pose keypoint as inputs and the original video frames as training label. Additionally, we employ dense body part segmentations and a gabor-filter-based structure representation as intermediate training annotations. After training, new source motions can be transfered to the actor using only our skeleton representation, which can additionally be normalized to fit the training data distribution as described in this paper of Chan et al.

To train on a custom dataset, you need to provide a directory containing a static background image of the scene called background.png, and a subdirectory called images/ comprising the video frames in natural ordering (we use a resolution of 512x512 pixels). Then run

./scripts/generate_dataset.py -i <path_to_dataset_directory> --train

to automatically generate all the required training inputs and labels. Several parameters and paths to third-party code can be adjusted directly in the script file.

If you want to generate only pose skeleton inputs from a source motion sequence, use:

./scripts/generate_dataset.py -i <source_actor_dataset_path> --pose_norm <target_actor_dataset_path> 

Here, statistics from the target actor dataset will be extracted to perform pose and motion normalization. Alternatively, unnormalized pose skeletons (generated for training) can be used for comparable source and target actor sequences.

Training

To train a new model, you need a dataset and configuration file, which is automatically created in the configs directory when calling the data preprocessing script. A list of all configuration options can be found here. After customizing the dataset and training parameters in the configuration file, run

./scripts/train.py -c <config_file_path>

to start the training.

Inference

To transfer new motion sequences to the target actor, adjust the dataset and inference parameters in the configuration file, before running:

./scripts/run.py -c <config_file_path>

Note that during inference, the dataset path refers to the source actor sequence containing the new pose skeletons instead of the dataset used to train the target actor networks.

Citation

If you use our code for your publications, please cite our paper using the following BibTeX:

@InProceedings{Kappel_2021_CVPR,
    author    = {Kappel, Moritz and Golyanik, Vladislav and Elgharib, Mohamed and Henningson, Jann-Ole and Seidel, Hans-Peter and Castillo, Susana and Theobalt, Christian and Magnor, Marcus},
    title     = {High-Fidelity Neural Human Motion Transfer From Monocular Video},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2021},
    pages     = {1541-1550}
}

Acknowledgments

This work was partially funded by the DFG (MA2555/15-1 ``Immersive Digital Reality'') and the ERC Consolidator Grant 4DRepLy (770784).