ZiqiaoPeng / SyncTalk

[CVPR 2024] This is the official source for our paper "SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis"
https://ziqiaopeng.github.io/synctalk/
Other
1.24k stars 143 forks source link

Mouth tends to stay open on silences #64

Closed vemacs closed 4 months ago

vemacs commented 4 months ago

On every clip I train (including those with longer silences), the inference output results in the mouth staying open during silences (max mel value of -3.5 ish across all frequency bands).

https://github.com/ZiqiaoPeng/SyncTalk/assets/2669187/ad1d3b16-79e3-4aed-9853-10d2c02ae3dd

I was thinking of detecting silences in the input and then replacing the silent enc_auds with the ones right before the silence where the lip is closed. Is this an architectural problem with the audio encoding/does anybody have a resolution?

ZiqiaoPeng commented 4 months ago

You can record more silent data in the training data to improve the visual quality.