ControlNet / MARLIN

[CVPR] MARLIN: Masked Autoencoder for facial video Representation LearnINg
https://openaccess.thecvf.com/content/CVPR2023/html/Cai_MARLIN_Masked_Autoencoder_for_Facial_Video_Representation_LearnINg_CVPR_2023_paper
Other
209 stars 20 forks source link

How to use it in wav2lip? #16

Closed Rookie-Kai closed 7 months ago

Rookie-Kai commented 7 months ago

Hello, thank you very much for your work. When I read your paper, I found that you tried to use MARLIN in wav2lip, but I'm sorry I couldn't find out in your project how to use MARLIN in wav2lip, so I'd like to ask you how you implemented it at that time, and what changes did you make? I want to try to recreate it in wav2lip. I'm sorry to bother you in the midst of your busy schedule, and thank you again for your work. This is really a very interesting job.

ControlNet commented 7 months ago

The brief idea is to use pretrained MARLIN encoder to replace the wav2lip facial encoder. To achieve that, we adjust the temporal frame window from 5 to 16 to fit the MARLIN encoder shape. And we retrain the syncnet with 16 frames as well.

And the decoder is also modified to fit the dimension. And also, I resized the input image to concat decoder feature maps as a replacement for original unet connections.

JanFschr commented 3 months ago

any plans to release the modified code?