Closed Rookie-Kai closed 11 months ago
To train the lip-reading experts, please refer to https://github.com/facebookresearch/av_hubert/tree/main.
The lip-reading experts and the observer are fine-tuned on LRS2.
Note that, AV-Hubert have a pre-processing to normalize input images. Instead, we do not use this pre-processing and only crop the centre 88*88 patches from images to finetune the experts and observers.
Thank you very much for your work. I would like to ask you how you train lip reading experts, which I did not find in your code and paper. Are you using avhubrt to fine-tune on lrs2, and the weight you get is the weight of lip-reading experts? And what is your lip reading observer weight responsible for? Finally, thank you again for your work.