Video-MAC / VideoMAC

Official code for CVPR2024 “VideoMAC: Video Masked Autoencoders Meet ConvNets”
https://arxiv.org/abs/2402.19082
MIT License
8 stars 1 forks source link

How to apply it to other downstream tasks: #2

Open asher-bit opened 3 weeks ago

asher-bit commented 3 weeks ago

Hello, thank you very much for your work. I would like to try applying this network to other downstream tasks. Do I need to retrain the network? Could you please provide the pre-trained network model? During inference, do I only need to use the target encoder and connect it to the corresponding task decoder? Thank you very much!

PGSmall commented 3 weeks ago

Please refer to videomac.

asher-bit commented 3 weeks ago

Thank you for your response. I apologize as I am a beginner in this area. Can I understand it as you applied MASK to the input during the pre-training phase to obtain a pre-trained model, and during the inference phase, you can directly use this pre-trained model to obtain feature vectors?

PGSmall commented 3 weeks ago

That's right. The inference phase only requires the features of the video frames to be extracted using the VideoMAC model, and then the downstream tasks, such as VOS, can be performed using the label propagation method (CRW).

asher-bit commented 3 weeks ago

I understand, thank you for your response!