eeskimez / emotalkingface

The code for the paper "Speech Driven Talking Face Generation from a Single Image and an Emotion Condition"
MIT License
161 stars 29 forks source link

Training gives same frame in whole video #4

Open parth-shettiwar opened 2 years ago

parth-shettiwar commented 2 years ago

Hi, We have been trying to run the code. The pretrained model which you have provided works perfectly fine, however when we train the model from scratch, the generated output is always same frame in the whole video with audio running in background and no changes in facial or lip features. What could be the potential reasons for this observation? Was such an observation noticed by you while doing training?

The following are the changes we did to the code: 1) We have been trying to train this model on only 2 emotions: Happy and Sad. Rest all emotions are removed when creating the dataset. Also we selected only subset of dataset for these 2 emotions (around 500 videos) 2) Pretraining of discriminator and generator for 5 epochs and performed the joint training for 7 epochs

Is this due to incorrect dataset preparation or absence of other emotions (like Neutral face) or incomplete training (very less epochs) or any other reason?

Thanks in advance

eeskimez commented 2 years ago

Thanks for your interest in our work! What is the learning rate you are using? It might be too low, please try 1e-3 or 1e-4. The whole dataset that includes all emotions is already small, by using only two emotions, I am expecting worse results. However, you should still be able to see some lip motion. You can try using only reconstruction loss to see if lips are moving.

Also, please check the dataset before training. You can animate the images with audio to see if there are any issues with data preparation.

Best, Emre