Sxjdwang / TalkLip

373 stars 34 forks source link

Why the WER is 66% when I used your checkpoint to train the model? #22

Open jtTATannn opened 10 months ago

jtTATannn commented 10 months ago

Hello! Thank you for your outstanding efforts. I met some difficulties. Why the WER is 66% when I used your checkpoints to train the model?Is it because I didn't use your lip observer ckpt? How should I use the lip observer ckpt you gave us?

Sxjdwang commented 10 months ago

The official checkpoint from the AV-hubert differs from the lip observer checkpoint in two main ways:

  1. The official checkpoint is fine-tuned on LRS3, while the lip observer checkpoint is fine-tuned on LRS2.
  2. The official checkpoint requires preprocessing of input images, which involves removing rotation and scale impact. This preprocessing is not suitable for generating talking faces. Consequently, we fine-tune the lip observer without this preprocessing step.

For instructions on how to use the lip observer checkpoint, please refer to the "evaluation" directory's readme.

jtTATannn commented 10 months ago

Thank you so much!