Please HELP me to Understand this one!!

Sushana142 commented 3 years ago

First of all, I'm really appreciate this work on the Accurate Lip Sync model. My question is. in the wav2lip model/Face_decoder_block, in the first Conv2d, the output channel is 512, but Next ConvTranspose takes 1024 input channels. which I didn't understand quite well, and how it is possible.

`self.face_decoder_blocks = nn.ModuleList([ nn.Sequential(Conv2d(512, 512, kernel_size=1, stride=1, padding=0),),

        nn.Sequential(Conv2dTranspose(1024, 512, kernel_size=3, stride=1, padding=0), # 3,3
        Conv2d(512, 512, kernel_size=3, stride=1, padding=1, residual=True),),`

Sorry for this newbie question, It will be really helpful for me if you help me to understand this. Thanks in advance.

Sushana142 commented 3 years ago

Is there anyone who understood this????

Rudrabha commented 3 years ago

There is a skip connection from the encoder which is concatenated channel wise. That increases the number of channels.

Rudrabha / Wav2Lip

Please HELP me to Understand this one!! #246