ControlNet / MARLIN

[CVPR] MARLIN: Masked Autoencoder for facial video Representation LearnINg
https://openaccess.thecvf.com/content/CVPR2023/html/Cai_MARLIN_Masked_Autoencoder_for_Facial_Video_Representation_LearnINg_CVPR_2023_paper
Other
209 stars 20 forks source link

How to finetune the pre-trained MARLIN encoder for a video classification task? #21

Closed imxtx closed 3 months ago

imxtx commented 7 months ago

Thank you for this great work! I have a video classification task, how to finetune the pre-trained MARLIN encoder? I have videos with an average length of 5 mins. What should I do to prepare the dataset? What parts of code should I modify?

ControlNet commented 6 months ago

Hi, please check the finetune code for CelebV-HQ. I think it is feasible to adapt your dataset to it.

imxtx commented 6 months ago

Thank you very much. I'll look into it.

imxtx commented 6 months ago

Hi authors, I tried to preprocess my custom videos like the celebv-hq dataset, but I found that the output video jitters between frames because the face cropping bounding boxes are different between frames. I'm wondering if it is okay to use them to finetune.

ControlNet commented 6 months ago

Hi, you can have a try with the naive frame-level cropping. If it the finetune performance doesn't work well, you can try to fix the crop size to make it have less "jitters".

imxtx commented 6 months ago

I want to download the pretrained model. May I ask what does Encoder MACs mean? Thank you!

image
ControlNet commented 6 months ago

MACs is Multiply-Accumulate Operations, to measure the time complexity of the model.

imxtx commented 6 months ago

MACs is Multiply-Accumulate Operations, to measure the time complexity of the model.

Thank you very much.