This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs
I was wondering if there was a way to train Syncnet on a higher context window specifically 25 frames and 80 mel steps ( 80 corresponds to 1 second of audio). It would seem major changes would be needed in the architecture. Perhaps the Wav2lip generator speech encoder also shares the same architecture as the sync net speech encoder if you look closely. So the generator would need to output 25 samples before input into the lip sync discriminator?
Any tips on this would be appreciated. I think with a higher context window you could achieve even better sync.
I was wondering if there was a way to train Syncnet on a higher context window specifically 25 frames and 80 mel steps ( 80 corresponds to 1 second of audio). It would seem major changes would be needed in the architecture. Perhaps the Wav2lip generator speech encoder also shares the same architecture as the sync net speech encoder if you look closely. So the generator would need to output 25 samples before input into the lip sync discriminator?
Any tips on this would be appreciated. I think with a higher context window you could achieve even better sync.