Why the WER is 66% when I used your checkpoint to train the model?

jtTATannn commented 1 year ago

Hello! Thank you for your outstanding efforts. I met some difficulties. Why the WER is 66% when I used your checkpoints to train the model?Is it because I didn't use your lip observer ckpt? How should I use the lip observer ckpt you gave us?

Sxjdwang commented 1 year ago

The official checkpoint from the AV-hubert differs from the lip observer checkpoint in two main ways:

The official checkpoint is fine-tuned on LRS3, while the lip observer checkpoint is fine-tuned on LRS2.
The official checkpoint requires preprocessing of input images, which involves removing rotation and scale impact. This preprocessing is not suitable for generating talking faces. Consequently, we fine-tune the lip observer without this preprocessing step.

For instructions on how to use the lip observer checkpoint, please refer to the "evaluation" directory's readme.

jtTATannn commented 1 year ago

Thank you so much！

Sxjdwang / TalkLip

Why the WER is 66% when I used your checkpoint to train the model? #22