Open asher-bit opened 4 months ago
Thank you for your response. I apologize as I am a beginner in this area. Can I understand it as you applied MASK to the input during the pre-training phase to obtain a pre-trained model, and during the inference phase, you can directly use this pre-trained model to obtain feature vectors?
That's right. The inference phase only requires the features of the video frames to be extracted using the VideoMAC model, and then the downstream tasks, such as VOS, can be performed using the label propagation method (CRW).
I understand, thank you for your response!
Hello, thank you very much for your work. I would like to try applying this network to other downstream tasks. Do I need to retrain the network? Could you please provide the pre-trained network model? During inference, do I only need to use the target encoder and connect it to the corresponding task decoder? Thank you very much!