DinoMan / speech-driven-animation

949 stars 289 forks source link

Clarity related to Sync Discriminator #54

Closed Aithu-Snehith closed 3 years ago

Aithu-Snehith commented 3 years ago

Hi, Kudos for the great work. I am trying to implement your model on custom dataset. Could you please provide clarity on how you are training the synchronous discriminator?, As mentioned in the paper the discriminator is trained with original clips as in-sync class and mismatched clips as out-of-sync.

Can I know if this discriminator is trained also trained while training the generator itself or will it be pre-trained?

If sync discriminator is also trained when the other discriminators and generator is trained, What will be the label assigned to the generated video and audio pair to calculate the loss and propagate it back?

The pertained models provided are trained with having the sync discriminator also mentioned in your latest paper?

Thanks

DinoMan commented 3 years ago

The models provided correspond to the latest paper.

The synchronization discriminator (as the name suggests) is a discriminator, not a perceptual loss so it is not pre-trained. The entire model is trained end-to-end. In sync pairs are treated as real (the label is 1) by the discriminator. Out-of-sync pairs and pairs with generated video are treated as fake (the label is 0) by the discriminator. All these details are in the latest paper