Open Nyquist0 opened 7 months ago
Could you provide the video and audio you employed to generate this video?
Sure. Attached zip includes source video, source audio, and generated video.
Thanks for helping checking that.
Hi @Sxjdwang, I have tested more samples, but got bad effects too. I am considering that might be because of the gap between the training dataset and my testing data, which is in-the-wild.
Would you mind give me some advice to reduce that gap? Like face area resolution (although I think you will resize the cropped detected facial area)? And the testing video fps == 25 and audio data sample rate == 16khz.
Dear Sir or Madam,
Thanks for making this projects open-sourced. Appreciate that.
But I found I cannot get a make-sense result. In most times, there are severe blur in the mouth area. Like the following video shows.
https://github.com/Sxjdwang/TalkLip/assets/43435441/455d800b-31b2-40d5-9570-7e1793e7f101
I am assuming that it is because the number of reference identity input is only one. It must be open-mouth or close mouth. So in one single generation period, the network cannot get both open-mouth and close-mouth identity characteristic feature of the face, so it will lead to much blur.
Please correct me if I was wrong.