Closed Rookie-Kai closed 11 months ago
The brief idea is to use pretrained MARLIN encoder to replace the wav2lip facial encoder. To achieve that, we adjust the temporal frame window from 5 to 16 to fit the MARLIN encoder shape. And we retrain the syncnet with 16 frames as well.
And the decoder is also modified to fit the dimension. And also, I resized the input image to concat decoder feature maps as a replacement for original unet connections.
any plans to release the modified code?
Hello, thank you very much for your work. When I read your paper, I found that you tried to use MARLIN in wav2lip, but I'm sorry I couldn't find out in your project how to use MARLIN in wav2lip, so I'd like to ask you how you implemented it at that time, and what changes did you make? I want to try to recreate it in wav2lip. I'm sorry to bother you in the midst of your busy schedule, and thank you again for your work. This is really a very interesting job.