UttaranB127 / speech2affective_gestures

This is the official implementation of the paper "Speech2AffectiveGestures: Synthesizing Co-Speech Gestures with Generative Adversarial Affective Expression Learning".
https://gamma.umd.edu/s2ag/
MIT License
44 stars 9 forks source link
affective-computing gesture-generation intelligent-agent speech-processing text-processing virtual-agent

Speech2AffectiveGestures: Synthesizing Co-Speech Gestures with Generative Adversarial Affective Expression Learning

This is the readme to use the official code for the paper Speech2AffectiveGestures: Synthesizing Co-Speech Gestures with Generative Adversarial Affective Expression Learning. Please use the following citation if you find our work useful:

@inproceedings{bhattacharya2021speech2affectivegestures,
author = {Bhattacharya, Uttaran and Childs, Elizabeth and Rewkowski, Nicholas and Manocha, Dinesh},
title = {Speech2AffectiveGestures: Synthesizing Co-Speech Gestures with Generative Adversarial Affective Expression Learning},
year = {2021},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
booktitle = {Proceedings of the 29th ACM International Conference on Multimedia},
series = {MM '21}
}

Installation

Our scripts have been tested on Ubuntu 18.04 LTS with

  1. Clone this repository.

We use $BASE to refer to the base directory for this project (the directory containing main_v2.py). Change present working directory to $BASE.

  1. [Optional but recommended] Create a conda envrionment for the project and activate it.
conda create s2ag-env python=3.7
conda activate s2ag-env
  1. Install espeak.
sudo apt-get update && sudo apt-get install espeak
  1. Install PyTorch following the official instructions.

  2. Install all other package requirements.

pip install -r requirements.txt

Note: You might need to manually uninstall and reinstall numpy for torch to work. You might need to manually uninstall and reinstall matplotlib and kiwisolver for them to work.

Downloading the datasets

  1. The Ted Gestures dataset is available for download here, originally hosted at https://github.com/ai4r/Gesture-Generation-from-Trimodal-Context.

  2. The Trinity Gesture dataset is available for download on submitting an access request here.

Running the code

Run the main_v2.py file with the appropriate command line arguments.

python main_v2.py <args list>

The full list of arguments is available inside main_v2.py.

For any argument not specificed in the command line, the code uses the default value for that argument.

On running main_v2.py, the code will train the network and generate sample gestures post-training.

Pre-trained models

We also provide a pretrained model for download. If using this model, save it inside the directory $BASE/models/ted_db (create the directory if it does not exist). Set the command-line argument --train-s2ag to False to skip training and use this model directly for evaluation. The generated samples are stored in the automatically created render directory.

Additionally, we provide the pre-trained weights of the embedding network required to estimate the Fréchet Gesture Distance between the ground-truth and the synthesized gestures. If using these weights, store them in the directory $BASE/outputs.