Weizhi-Zhong / IP_LAP

CVPR2023 talking face implementation for Identity-Preserving Talking Face Generation With Landmark and Appearance Priors
Apache License 2.0
665 stars 73 forks source link

静默音频嘴无法停下的问题 #5

Closed 1059692261 closed 1 year ago

1059692261 commented 1 year ago

非常感谢作者能开源这么优秀的项目! 我使用自己的视频及音频做了下推理测试,发现了类似于wav2lip的问题,即是当出现音频停下的静默片段,人物的嘴仍然会露出原始嘴型(可能嘴是闭上的,但是嘴角仍然按照原始嘴型在抽动)。想请问下作者这个问题是否有解决思路?如果在训练时添加这样静默的数据,是否会改善这个问题?

junleen commented 1 year ago

你也可以把这些片段剪掉

junleen commented 1 year ago

抖个机灵,别在意

Weizhi-Zhong commented 1 year ago

Hi, thanks for your attention and great observation. The methods requiring a template video as input tend to face the problem mentioned above.In my view, this problem may have multiple reasons. For example, the input template video during inference may affect the generated result. Such as in Wav2Lip, upon careful examination, you can find that it uses the t-th frame of the input template video as reference image to generate the t-th frame. The lip shape of the t-th frame in input video may affect the generated lip shape. This may be resulted by the skip connection of their U-Net-based network. Our framework does not use this design and avoided this issue. Another reason may be the inaccurate mapping from audio to landmark/video at some time when the audio is silent. It may be related to the training dataset because it contains very few segments of silence. But I am not sure if it will work, you can try it~

1059692261 commented 1 year ago

感谢回复!