Paper | Video | Project Page
This repository contains the official implementation of the paper:
Simon Giebenhain,
Tobias Kirschstein,
Markos Georgopoulos,
Martin Rünz,
Lourdes Agaptio and
Matthias Nießner
CVPR 2024 Highlight
a) Setup a conda environment and activate it via
conda env create -f environment.yml
conda activate mononphm
which creates a new enivornment named mononphm
. (Installation may take some time).
b)
Next, manually install Pytorch
related packages. MonoNPHM depends on Pytorch3D and PytorchGeometric, which can sometimes be tricky to install.
On Linux the following order of commands worked for us:
# Install pytorch with CUDA support
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia
# Install PytorchGeometry and helper packages with CUDA support
conda install pyg -c pyg
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.0.0+cu117.html
# Install Pytorch3D with CUDA support
conda install -c fvcore -c iopath -c conda-forge fvcore iopath
conda install pytorch3d=0.7.4 -c pytorch3d
Finally, fix the numpy
version using
pip uninstall numpy
pip install numpy==1.23
pip install pyopengl==3.1.5
c) Install the mononphm
pacakge in an editable way by running
pip install -e .
All paths to data / models / infernce are defined by environment variables.
For this we recomment to create a file in your home directory in ~/.config/mononphm/.env
with the following content:
MONONPHM_CODE_BASE="{LOCATION OF THIS REPOSITORY}"
MONONPHM_TRAINING_SUPERVISION="{LOCATION WHERE TRAINING SUPERVISION DATA WILL BE STORED}"
MONONPHM_DATA="{LOCATION OF NPHM DATASET}"
MONONPHM_EXPERIMENT_DIR="{LOCATION FOR TRAINING RUNS }"
MONONPHM_DATA_TRACKING="{LOCATION FOR TRACKING INPUT}"
MONONPHM_TRACKING_OUTPUT="{LOCATION FOR TRACKING OUTPUT}"
Replace the {...}
with the locations where data / models / experiments should be located on your machine.
If you do not like creating a config file in your home directory, you can instead hard-code the paths in the env.py.
Note that using the .config
folder can be great advantage when working with different machines, e.g. a local PC and a GPU cluster.
Our tracking alogorithm relies on FLAME tracking as initiliazation. Therefore, you will need an account for the FLAME website.
To clone the necessary repositories for preprocessing and perform minor code adjustments thereof, run
bash install_preprocessing_pipeline.sh
Finally, you will need to download the weights for the employed PIPNet facial landmark detector from here.
Download the folder snapshots/WFLW
and place it into src/mononphm/preprocessing/PIPnet/snapshots
.
Next, download the weights modnet_webcam_portrait_matting.ckpt
from here for MODNet.
Then place them in src/mononphm/preprocessing/MODNet/pretrained
.
You can download the demo data from here and move the conentens into the folder specified by MONONPHM_DATA_TRACKING
. We provide 6 examples from the FFHQ dataset alongside the preprocessing results.
Additionally, we provide an extension to the NPHM Dataset, which now contains 488 people. To download the data, you will need to fill out the Terms of Service.
We provide pretrained models here. Place the contents into MONONPHM_EXPERIMENT_DIR
.
Our test data from the MonoNPHM paper can be downloaded, after agreeing to the Terms of Service here.
You can run single-image head reconstruction on a few FFHQ examples using
python scripts/inference/rec.py --model_type nphm --exp_name pretrained_mononphm --ckpt 2500 --seq_name FFHQ_ID --no-intrinsics_provided --downsample_factor 0.33 --no-is_video
where FFHQ_ID
can be one of the folder names of the provided demo data.
You will find the results in MONONPHM_TRACKING_OUTPUT/stage1/FFHQ_ID
.
When working with your own data, you will need to run the preprocessing pipeline, including landmark detection, facial segmentation, background matting, and FLAME tracking to initialize the head pose. To this end you can run
cd scripts/preprocessing
bash run.sh 510_seq_4 --intrinsics_provided
which will run all neccessary steps for the sequence named 510_seq_4
located in MONONPHM_DATA_TRACKING
.
The intrinsics_provided
flag reads the camera_intrinsics.txt
from the env_paths.ASSETS
folder and provides the metrical tracker with it.
For MonoNPHM tracking, run
python scripts/inference/rec.py --model_type nphm --exp_name pretrained_mononphm_original --ckpt 2500 --seq_name 510_seq_4 --intrinsics_provided --is_video
python scripts/inference/rec.py --model_type nphm --exp_name pretrained_mononphm_original --ckpt 2500 --seq_name 510_seq_4 --intrinsics_provided --is_video --is_stage2
for the stage 1 and stage 2 of our proposed optimization. (Note that stage2 optimization is only needed for videos.)
The results can be found in MONONPHM_TRACKING_OUTPUT/EXP_NAME/stage1/510_seq_4
.
After having tracked the kinect videos, you can run the evaluation scrip using
python scripts/evaluation/eval.py --model_name pretrained_mononphm_original
There is an --is_debug
flage that can be used to visualize the individual steps which are necessary to perform, before computing Chamfer-style metrics.
The average the results across all sequences, you can use the scripts/evaluation/gather_metrics.py
To train a model yourself, you will first need to generate the necessary training supervision data. Running
python scripts/data_processing/compute_fields_new.py --starti START_PID --endi END_PID
python scripts/data_processing/compute_deformation_field.py
will create the necessray data. Note that especially compute_fields_new.py
can take a long time and consumes a lot of storage. (It is possible to reduce the hard-coded number of training samples per scan in scripts/data_processing/compute_fields_new.py
).
START_PID
and END_PID
specify the range of participant IDs for which the training data will be computed (exlcufing END_PID
).
To start the training itself, run
python scripts/training/launch_training.py --model_type nphm --cfg_file scripts/configs/mononphm.yaml --exp_name MODEL_NAME --color_branch
If you are training on a headless machine, prepending PYOPENGL_PLATFORM=osmesa
might be necessary.
For our experiments we used 4 GPUs. By detault the training script will use all available GPUs on your machine, and the batch_size
parameter in the configs refers to the per-GPU batch size.
If you find our code or paper useful, please consider citing
@inproceedings{giebenhain2024mononphm,
author={Simon Giebenhain and Tobias Kirschstein and Markos Georgopoulos and Martin R{\"{u}}nz and Lourdes Agapito and Matthias Nie{\ss}ner},
title={MonoNPHM: Dynamic Head Reconstruction from Monocular Videos},
booktitle = {Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)},
year = {2024}}
If you find the NPHM dataset helpful, consider citing
@inproceedings{giebenhain2023nphm,
author={Simon Giebenhain and Tobias Kirschstein and Markos Georgopoulos and Martin R{\"{u}}nz and Lourdes Agapito and Matthias Nie{\ss}ner},
title={Learning Neural Parametric Head Models},
booktitle = {Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)},
year = {2023}}
Contact Simon Giebenhain for questions, comments and reporting bugs, or open a GitHub Issue.