Rudrabha / Wav2Lip

This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs
https://synclabs.so
10.18k stars 2.19k forks source link

Batch norm instead of instance norm #275

Closed sunwoo76 closed 3 years ago

sunwoo76 commented 3 years ago

Hello. Your work is really helpful!

I wanna modifiy your network. and I have questions.

  1. Why do you use batchnormalization? : In common, GAN usually use instance normalization rather than batch normalization. Is there a special reason you use it?

  2. Why adv_wt is only multiplied by the perceptual loss, not multiplied by the discriminator loss? : I think this reason is that the model could learn the image to image mapping without considering audio features. so, the adversarial loss of the generator should be low. Is it right?

Rudrabha commented 3 years ago
  1. We trained the Wav2Lip without the visual quality disc initially. This had batch normalization in the generator. We then added a visual quality disc once we had satisfactory sync quality. In the visual quality disc, we used conv blocks without normalization. So, we have not used batch norm in the GAN discriminator.
  2. I don't fully understand this question. We have multiplied a weight to the adversarial loss here in this line.