DiffTED: One-shot Audio-driven TED Talk Video Generation with Diffusion-based Co-speech Gestures

This is the official code for DiffTED: One-shot Audio-driven TED Talk Video Generation with Diffusion-based Co-speech Gestures

Installation

To install dependencies run:

Install packages
```
pip install -r requirements.txt
```
Install Thin-Plate-Spline-Motion-Model

Follow directions at install https://github.com/yoyo-nb/Thin-Plate-Spline-Motion-Model/

Place repository at scripts/tps/

Dataset

Follow instructions from MRAA to download TED-talks dataset.

Training

To train model run:

python scripts/train_tpsm.py --config config/pose_diffusion.yml

Inference

To run generate videos run:

python scripts/test_tpsm.py long <checkpoint_path> <test_data_path>

Citation

If you find our work useful, please kindly cite as:

@InProceedings{Hogue2024,
    author    = {Hogue, Steven and Zhang, Chenxu and Daruger, Hamza and Tian, Yapeng and Guo, Xiaohu},
    title     = {DiffTED: One-shot Audio-driven TED Talk Video Generation with Diffusion-based Co-speech Gestures},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
    month     = {June},
    year      = {2024},
    pages     = {1922-1931}
}

Acknowledgement

The codebase is developed based on DiffGesture of Zhu et al.

Ditzley / DiffTED

readme