Open votuongquan opened 2 months ago
Thank you for your interest in our work!
We internally tested with interview videos from Youtube and our model could generate somewhat intelligible speech.
We plan to provide a complete pipeline for generating output from random videos and sample speech, but I think it will be a bit difficult for the time being.
You could share your downloaded video, sample speech audio, and generated output to my email jeongsoo.choi@kaist.ac.kr, then let me check whether the quality is expected or not.
First of all, congrats on your excellent work! Since you did not mention any detailed instructions for In The Wild Videos Data Preparation/Inference, could you please confirm that whether the repo could only work with preprocessed LRS3/LRS2 datasets and is it possible to do the Inference with random English speaking videos (maybe randomly picked from a Youtube channel), including preparation phase for these videos separately from LRS3/LRS2 preparation too?? I asked this questions because I have met this problem for a few days. Hope you answer my questions soon ! Thank you so much !!