Sxjdwang / TalkLip

405 stars 36 forks source link

Why the WER is 66% when I used your checkpoint to train the model? #22

Open jtTATannn opened 1 year ago

jtTATannn commented 1 year ago

Hello! Thank you for your outstanding efforts. I met some difficulties. Why the WER is 66% when I used your checkpoints to train the model?Is it because I didn't use your lip observer ckpt? How should I use the lip observer ckpt you gave us?

Sxjdwang commented 1 year ago

The official checkpoint from the AV-hubert differs from the lip observer checkpoint in two main ways:

  1. The official checkpoint is fine-tuned on LRS3, while the lip observer checkpoint is fine-tuned on LRS2.
  2. The official checkpoint requires preprocessing of input images, which involves removing rotation and scale impact. This preprocessing is not suitable for generating talking faces. Consequently, we fine-tune the lip observer without this preprocessing step.

For instructions on how to use the lip observer checkpoint, please refer to the "evaluation" directory's readme.

jtTATannn commented 1 year ago

Thank you so much!