ShenhanQian / VHAP

A complete head tracking pipeline from videos to NeRF/3DGS-ready datasets.
57 stars 7 forks source link

Versatile Head Alignment with Adaptive Appearance Priors



This work is made available under CC-BY-NC-SA-4.0. The repository is derived from the multi-view head tracker of GaussianAvatars, which is subjected to the following statements:

Toyota Motor Europe NV/SA and its affiliated companies retain all intellectual property and proprietary rights in and to this software and related documentation. Any commercial use, reproduction, disclosure or distribution of this software and related documentation without an express license agreement from Toyota Motor Europe NV/SA is strictly prohibited.

On top of the original repository, we add support to monocular videos and provide a complete set of scripts from video preprocessing to result export for NeRF/3DGS-style applications.


git clone

conda create --name VHAP -y python=3.10
conda activate VHAP

# Install CUDA and ninja for compilation
conda install -c "nvidia/label/cuda-12.1.1" cuda-toolkit ninja cmake  # use the right CUDA version
ln -s "$CONDA_PREFIX/lib" "$CONDA_PREFIX/lib64"  # to avoid error "/usr/bin/ld: cannot find -lcudart"
conda env config vars set CUDA_HOME=$CONDA_PREFIX  # for compilation

# Install PyTorch (make sure that the CUDA version matches with "Step 1")
pip install torch torchvision --index-url
# or
conda install pytorch torchvision pytorch-cuda=12.1 -c pytorch -c nvidia
# make sure torch.cuda.is_available() returns True

pip install -e .


  • We use an adjusted version of nvdiffrast for backface-culling. To completely remove previous versions and compiled pytorch extensions, you can execute

    pip uninstall nvdiffrast
    rm -r ~/.cache/torch_extensions/*/nvdiffrast*
  • We use STAR for landmark detection by default. Alterntively, face-alignment is faster but less accurate.



Our code relies on FLAME. Please download assets from the official website and store them in the paths below:

NOTE: It is possible to use FLAME 2020 by download to asset/flame/generic_model.pkl. The FLAME_MODEL_PATH in needs to be updated accordingly.

Video Data


To get access to NeRSemble dataset, please request via the Google Form. The directory structure is expected to be like this.


We use monocular video sequences following INSTA. You can download raw videos from LRZ.


For Monocular Videos

For NeRSemble Dataset


Photometric alignment is versatile but sometimes sensitive.

Color affinity: If the color of a point on the foreground contour is too close to the background, the static_offset can go wild. You may try a different background color by --data.background_color white or --data.background_color black. You can also disable static_offset by --model.no_use_static_offset.

Occlussion: When the neck is occluded by collars, the photometric gradients may squeeze and stretch the neck into unnatural shapes. Usually, this problem can be relieved by disabling photometric alignment in certain regions. We hard-coded the occlusion status for some subjects in the NeRSemble dataset with the occluded_table. You can extend the table or temporally change it by, e.g., --model.occluded neck_lower boundary.

Limited degree of freedom: Another limitation comes from the FLAME model. FLAME is great since it covers the whole head and neck. However, there is only one joint for the neck, between the neck and the head. This means the lower part of the neck cannot move relative to the torse. This limits the model's ability to capture large movement of the head. For example, it's very hard to achieve good alignment of the lower neck and the head at the same time for the EXP-1-head sequence in NeRSemble dataset because of the aforementioned lack of degree of freedom.

You are welcomed to report more failure cases and help us improve the tracker.

Interactive Viewers

Our method relies on vertex masks defined on FLAME. We add custom masks to enrich the original ones. You can play with regions in our FLAME Editor to see how each mask look like .

python vhap/

We also provide a FLAME viewer for you to interact with a tracked sequence.

python vhap/ --param_path output/nersemble/074_EMO-1_v16_DS4_wBg_staticOffset/2024-09-09_15-49-02/tracked_flame_params_30.npz


Please kindly cite our repository and preceding paper if you find our software or algorithm useful for your research.

  title   = "Versatile Head Alignment with Adaptive Appearance Priors",
  author  = "Qian, Shenhan",
  year    = "2024",
  month   = "September",
  url     = ""
  title={Gaussianavatars: Photorealistic head avatars with rigged 3d gaussians},
  author={Qian, Shenhan and Kirschstein, Tobias and Schoneveld, Liam and Davoli, Davide and Giebenhain, Simon and Nie{\ss}ner, Matthias},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},