eeskimez / emotalkingface

The code for the paper "Speech Driven Talking Face Generation from a Single Image and an Emotion Condition"
MIT License
161 stars 29 forks source link

About personal test wav result #3

Open renrenzsbbb opened 2 years ago

renrenzsbbb commented 2 years ago

Thanks for your great work! I use the original wav extracted from flv to test the pretrained model, it will return the good result. However, when I use my original wav, sometime the image is blur and deformed. Can you give me some suggestion to solve it. Thanks in davace.

eeskimez commented 2 years ago

Thanks for your interest in our work! The dataset used to train our pre-trained model is limited. That means it cannot generalize well to real-world conditions such as different microphone or camera characteristics. We augmented the images and added some noise to the audio, but the model's generalization is still limited. You can possibly try to train this model on a bigger dataset (without emotion labels, just set the labels to zero) and fine-tune it on the emotion dataset. This might lead to a better generalization.

Best, Emre