ControlNet / MARLIN

[CVPR] MARLIN: Masked Autoencoder for facial video Representation LearnINg
https://openaccess.thecvf.com/content/CVPR2023/html/Cai_MARLIN_Masked_Autoencoder_for_Facial_Video_Representation_LearnINg_CVPR_2023_paper
Other
231 stars 20 forks source link

How to use it in wav2lip? #16

Closed Rookie-Kai closed 11 months ago

Rookie-Kai commented 11 months ago

Hello, thank you very much for your work. When I read your paper, I found that you tried to use MARLIN in wav2lip, but I'm sorry I couldn't find out in your project how to use MARLIN in wav2lip, so I'd like to ask you how you implemented it at that time, and what changes did you make? I want to try to recreate it in wav2lip. I'm sorry to bother you in the midst of your busy schedule, and thank you again for your work. This is really a very interesting job.

ControlNet commented 11 months ago

The brief idea is to use pretrained MARLIN encoder to replace the wav2lip facial encoder. To achieve that, we adjust the temporal frame window from 5 to 16 to fit the MARLIN encoder shape. And we retrain the syncnet with 16 frames as well.

And the decoder is also modified to fit the dimension. And also, I resized the input image to concat decoder feature maps as a replacement for original unet connections.

JanFschr commented 7 months ago

any plans to release the modified code?