Closed chetanpandey1266 closed 3 years ago
@chetanpandey1266 the code part you are showing is for the Visual Quality Discriminator, and for Generator part we have reference image concatenated with maksed original image. But for Visual Quality Discriminator we have original image, and comparing it with Generated images.
(mel, [ref,oring[96//2:,;]=0] ----> gen ----> Generated frames, ---> sync loss + L1 loss
(original_image> ----> Dis ----> real loss (Generated frames ----> Dis ---> fake loss
Okay! My Bad Thanks for making it clear
@chetanpandey1266 Ask as many question, still lots of things are not clear to me as well, but with more discussion we will get the idea, more and more.
In model/wav2lip.py, you have taken the lower half part as the input for encoder while in paper you have taken the random sampled image and upper half part as the input. Can you elaborate on this part