Sxjdwang / TalkLip

373 stars 34 forks source link

the face in output video is blurred #9

Open ZardYuan opened 1 year ago

ZardYuan commented 1 year ago

Hi, thanks for your great work! I tested talklip with my own video, but the generated face in output video is blurred and appear clear border with background. The resolution of my test video is 1600x900.

Sxjdwang commented 1 year ago

As stated in the paper, we detect a face in an image and resize it to 9696, so that an output image is also 9696 and then is resized to the original size and embedded into the original image. For a high-resolution image, what you said is possible.

SUGE2016 commented 6 months ago

Whats need to do, if we want to implement somethink like HD TalkLip 192x192? thanks @Sxjdwang

Sxjdwang commented 6 months ago

Here are some advices: 1.Please evaluate the performance of the lip reading expert on lip reading, speech recognition, and audio-visual speech recognition using the dataset you are utilizing. 2.When testing the performance of lip reading or audio-visual speech recognition, please resize the videos in your dataset to ensure that the size of the faces is similar to that in the LRS2 dataset. Additionally, crop a region of interest (ROI) centered around the face. 3.If the performance of lip reading and audio-visual speech recognition is poor, it is recommended to fine-tune the lip reading expert. Since text annotation may not be available, the results of speech recognition can be considered as a substitute for the text annotation. 4.Once a fine-tuned lip-reading expert is obtained, you can proceed with training the talklip model.