About personal test wav result

eeskimez / emotalkingface

The code for the paper "Speech Driven Talking Face Generation from a Single Image and an Emotion Condition"

MIT License

161 stars 29 forks source link

Thanks for your interest in our work! The dataset used to train our pre-trained model is limited. That means it cannot generalize well to real-world conditions such as different microphone or camera characteristics. We augmented the images and added some noise to the audio, but the model's generalization is still limited. You can possibly try to train this model on a bigger dataset (without emotion labels, just set the labels to zero) and fine-tune it on the emotion dataset. This might lead to a better generalization.

Best, Emre

eeskimez / emotalkingface

About personal test wav result #3