UniTalker: Scaling up Audio-Driven 3D Facial Animation through A Unified Model [ECCV2024]

Useful Links

[Homepage] [arXiv] [Video]

UniTalker generates realistic facial motion from different audio domains, including clean and noisy voices in various languages, text-to-speech-generated audios, and even noisy songs accompanied by back-ground music.

UniTalker can output multiple annotations.

For datasets with new annotations, one can simply plug new heads into UniTalker and train it with existing datasets or solely with new ones, avoiding retopology.

Installation

Environment

Linux
Python 3.10
Pytorch 2.2.0
CUDA 12.1
transformers 4.39.3
Pytorch3d 0.7.7 (Optional: just for rendering the results)

  conda create -n unitalker python==3.10
  conda activate unitalker
  conda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=12.1 -c pytorch -c nvidia
  pip install transformers librosa tensorboardX smplx chumpy numpy==1.23.5 opencv-python

Inference

Download checkpoints, PCA models and template resources

UniTalker-B-[D0-D7]: The base model in paper. Download it and place it in ./pretrained_models .

UniTalker-L-[D0-D7]: The default model in paper. Please first try the base model to run the pipeline through.

Unitalker-data-release-V1: The released datasets, PCA models, data-split json files and id-template numpy array. Download and unzip it in this repo.

FLAME2020: Please download FLAME 2020 and move generic_model.pkl into resources/binary_resources/flame.pkl.

Use git lfs pull to get ./resources.zip and ./test_audios.zip and unzip it in this repo.

Finally, these files should be organized as follows:

├── pretrained_models
│   ├── UniTalker-B-D0-D7.pt
│   ├── UniTalker-L-D0-D7.pt
├── resources
│   ├── binary_resources
│   │   ├── 02_flame_mouth_idx.npy
│   │   ├── ...
│   │   └── vocaset_FDD_wo_eyes.npy
│   └── obj_template
│       ├── 3DETF_blendshape_weight.obj
│       ├── ...
│       └── meshtalk_6172_vertices.obj
└── unitalker_data_release_V1
│   ├── D0_BIWI
│   │   ├── id_template.npy
│   │   └── pca.npz
│   ├── D1_vocaset
│   │   ├── id_template.npy
│   │   └── pca.npz
│   ├── D2_meshtalk
│   │   ├── id_template.npy
│   │   └── pca.npz
│   ├── D3D4_3DETF
│   │   ├── D3_HDTF
│   │   └── D4_RAVDESS
│   ├── D5_unitalker_faceforensics++
│   │   ├── id_template.npy
│   │   ├── test
│   │   ├── test.json
│   │   ├── train
│   │   ├── train.json
│   │   ├── val
│   │   └── val.json
│   ├── D6_unitalker_Chinese_speech
│   │   ├── id_template.npy
│   │   ├── test
│   │   ├── test.json
│   │   ├── train
│   │   ├── train.json
│   │   ├── val
│   │   └── val.json
│   └── D7_unitalker_song
│       ├── id_template.npy
│       ├── test
│       ├── test.json
│       ├── train
│       ├── train.json
│       ├── val
│       └── val.json

Demo

  python -m main.demo --config config/unitalker.yaml test_out_path ./test_results/demo.npz
  python -m main.render ./test_results/demo.npz ./test_audios ./test_results/

Train

Download Data

Unitalker-data-release-V1 contains D5, D6 and D7. The datasets have been processed and grouped into train, validation and test. Please use these three datasets to try the training step. If you want to train the model on the D0-D7, you need to download the datasets following these links: D0: BIWI. D1: VOCASET. D2: meshtalk. D4,D5: 3DETF.

Modify Config and Train

Please modify dataset and duplicate_list in config/unitalker.yaml according to the datasets you have prepared, ensuring that both lists maintain the same length.

python -m main.train --config config/unitalker.yaml

X-niper / UniTalker

readme