Open Chuan-shanjia opened 4 months ago
Your answer is really helpful, thank youI! If I want to utilize the model to other video domains rather than action recognition. Will it be helpful to perform continue pretrain(stage1) on those videos? Or do you have any suggestions for improving performance in other video domains? Looking forward to your reply!
Sorry for late response. You can use the models with masked pretraining.
Hello! I'm very interested in your great work! I have two questions about pretraining. Does the generalization ability of UMT come from CLIP? With this in mind, regardless of what kind of pre-training dataset is used, it is all about approaching the effectiveness of the weights of the open-source CLIP. So do is the choice of pretraing dataset in stage1 important? Here's another question. Is the pre-training in stage2 helpful for visual-only tasks? If we finetune visual-only dataset on stage2 pretrained model, will it outperform stage1 pretrained model? Looking forward to you reply!