choijeongsoo / lip2speech-unit

[Interspeech 2023] Intelligible Lip-to-Speech Synthesis with Speech Units
Other
25 stars 2 forks source link

In The Wild Videos Inference #8

Open votuongquan opened 2 months ago

votuongquan commented 2 months ago

First of all, congrats on your excellent work! Since you did not mention any detailed instructions for In The Wild Videos Data Preparation/Inference, could you please confirm that whether the repo could only work with preprocessed LRS3/LRS2 datasets and is it possible to do the Inference with random English speaking videos (maybe randomly picked from a Youtube channel), including preparation phase for these videos separately from LRS3/LRS2 preparation too?? I asked this questions because I have met this problem for a few days. Hope you answer my questions soon ! Thank you so much !!

choijeongsoo commented 2 months ago

Thank you for your interest in our work!

We internally tested with interview videos from Youtube and our model could generate somewhat intelligible speech.

We plan to provide a complete pipeline for generating output from random videos and sample speech, but I think it will be a bit difficult for the time being.

You could share your downloaded video, sample speech audio, and generated output to my email jeongsoo.choi@kaist.ac.kr, then let me check whether the quality is expected or not.