Official Implementation of WACV 2024 paper, "CVTHead: One-shot Controllable Head Avatar with Vertex-feature Transformer"
Efficient and controllable head avatar generation from a single image with point-based neural rendering.
Reconstructing personalized animatable head avatars has significant implications in the fields of AR/VR. Existing methods for achieving explicit face control of 3D Morphable Models (3DMM) typically rely on multi-view images or videos of a single subject, making the reconstruction process complex. Additionally, the traditional rendering pipeline is time-consuming, limiting real-time animation possibilities. In this paper, we introduce CVTHead, a novel approach that generates controllable neural head avatars from a single reference image using point-based neural rendering. CVTHead considers the sparse vertices of mesh as the point set and employs the proposed Vertex-feature Transformer to learn local feature descriptors for each vertex. This enables the modeling of long-range dependencies among all the vertices. Experimental results on the VoxCeleb dataset demonstrate that CVTHead achieves comparable performance to state-of-the-art graphics-based methods. Moreover, it enables efficient rendering of novel human heads with various expressions, head poses, and camera views. These attributes can be explicitly controlled using the coefficients of 3DMMs, facilitating versatile and realistic animation in real-time scenarios.
conda create -n cvthead python=3.9
conda activate cvthead
pip install -r requirements.txt
cd data/
bash fetch_data.sh
cd ..
Please go to data/README.md
for more details.
Download our pre-trained model cvthead.pt
from Google Drive
and put it under data/
folder
Here is a demo to use CVTHead for cross-identity face reenactment
python inference.py --src_pth examples/1.png --drv_pth examples/2.png --out_pth examples/output.png --ckpt_pth data/cvthead.pt
Here is a demo for face generation under the control of FLAME coefficients
python inference.py --src_pth examples/1.png --out_pth examples --ckpt_pth data/cvthead.pt --flame
Note: some videos may no longer be available on Youtube. Due to the copyright and privacy issue, I cannot share these face videos to others.
face_alignment
The data organization should looks like
--VoxCeleb1
---- vox_video
------ train
-------- xxxx.mp4
-------- ......
------ test
-------- id10280#NXjT3732Ekg#001093#001192.mp4
-------- xxxx.mp4
-------- ......
---- vox_lmks_meta
------ train
-------- xxxx.pkl
-------- ......
------ test
-------- id10280#NXjT3732Ekg#001093#001192.pkl
-------- xxxx.pkl
-------- ......
torchrun --standalone --nnodes 1 --nproc_per_node 2 main_stage1.py --config configs/vox1.yaml
torchrun --standalone --nnodes 1 --nproc_per_node 2 main_stage2.py --config configs/vox1.yaml
ROME
DECA
Spiralnet++
face-parsing.PyTorch
face-alignment
If you found this code helpful, please consider citing:
@article{ma2023cvthead,
title={CVTHead: One-shot Controllable Head Avatar with Vertex-feature Transformer},
author={Ma, Haoyu and Zhang, Tong and Sun, Shanlin and Yan, Xiangyi and Han, Kun and Xie, Xiaohui},
journal={IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
year={2024}
}