Rudrabha / Wav2Lip

This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs
https://synclabs.so
10.18k stars 2.19k forks source link

Please HELP me to Understand this one!! #246

Closed Sushana142 closed 3 years ago

Sushana142 commented 3 years ago

First of all, I'm really appreciate this work on the Accurate Lip Sync model. My question is. in the wav2lip model/Face_decoder_block, in the first Conv2d, the output channel is 512, but Next ConvTranspose takes 1024 input channels. which I didn't understand quite well, and how it is possible.

`self.face_decoder_blocks = nn.ModuleList([ nn.Sequential(Conv2d(512, 512, kernel_size=1, stride=1, padding=0),),

        nn.Sequential(Conv2dTranspose(1024, 512, kernel_size=3, stride=1, padding=0), # 3,3
        Conv2d(512, 512, kernel_size=3, stride=1, padding=1, residual=True),),`

Sorry for this newbie question, It will be really helpful for me if you help me to understand this. Thanks in advance.

Sushana142 commented 3 years ago

Is there anyone who understood this????

Rudrabha commented 3 years ago

There is a skip connection from the encoder which is concatenated channel wise. That increases the number of channels.