Chinese characters are spoken faster than English words, will this model work on Chinese?

Hangz-nju-cuhk / Talking-Face-Generation-DAVS

Code for Talking Face Generation by Adversarially Disentangled Audio-Visual Representation (AAAI 2019)

MIT License

817 stars 173 forks source link

Chinese characters are spoken faster than English words, will this model work on Chinese? #51

Closed zwfcrazy closed 4 years ago

zwfcrazy commented 4 years ago

I want to build a dataset of Chinese characters to train this model. I applied speech recognition on some Chinese news videos (by CCTV). The recognition part was fine, but I found that Chinese characters are too short in terms of pronounce time because each of them has only one syllable. The average number of video frames it takes to show the lip movement of a single Chinese character is only 5 (fps=25), and It can be even as low as 2 frames. This is much less than the required 29 frames. Obviously, interpolation won't work well in this case. So I would like to know if you guys have considered Chinese? Will this model work? Is there any workaround?

Hangz-nju-cuhk commented 4 years ago

You can get rid of the recognition and adversarial part of the model. Then it can work regardless of language and input lengths. Although a crucial part is removed, I think at least reasonable results can be obtained in this way with acceptable performance. It will be better if the pretrained weights of our model can be loaded then finetuned on your dataset. However, you may need to modify the code (delete several parts, modify input length) for it to work well.

ak9250 commented 4 years ago

@zwfcrazy have you tried this https://github.com/yiranran/Audio-driven-TalkingFace-HeadPose seems to work regardless of language

@Hangz-nju-cuhk this paper https://arxiv.org/pdf/2004.12992.pdf cites this work and is able to handle head pose and speaker awareness

Hangz-nju-cuhk commented 4 years ago

@ak9250 Thanks for your reference. I am familiar with both these papers and even have seen their videos before they are on arxiv. They are both great works. I would definitely recommend researchers to try the state-of-the-art models, as mine seems a little out-of-date for now.

zwfcrazy commented 4 years ago

@ak9250 @Hangz-nju-cuhk sorry for the late reply. Thank you both! I will close this issue for now.