eeskimez / emotalkingface

The code for the paper "Speech Driven Talking Face Generation from a Single Image and an Emotion Condition"
MIT License
161 stars 29 forks source link

Recreating training from scratch #17

Closed marcus-reyes closed 4 months ago

marcus-reyes commented 1 year ago

Hi, I am trying to retrain everything from scratch.

Could I request further details on the training hyperparameters used for all three stages: Emotion discrimination, pretraining, training? Or perhaps on the final expected loss values for those plotted on tensorboard?

As mentioned in another issue raised using the pretrained model is fine. However, trying to train the model from scratch (as instructed via github) outputs different results. At best I have only been able to simulate opening of the mouth. I noticed that there are some discrepancies between the paper and the source code. (e.g. Discriminator's LR was said to be all 1e-4 but are defaulted to lower values inside train.py, frames per sample is stated to be 32 in the paper but is defaulted to 25 in train.py). I have also cut the training to shorter than the default 1000 epochs since 100k iterations stated in the paper seem to be around 100 epochs only if I understand this correctly.

Apologies for the nitpicking. I'm trying to recreate the retraining and am at a loss. Your response would greatly help me in the reducing the time for experimenting and recreating the training.

marcus-reyes commented 4 months ago

Eventually got reasonably close results by trying to stick as close as possible to the paper numbers.