Can I fine-tine it on a video dataset of 32 frames?

MCG-NJU / VideoMAE

[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

https://arxiv.org/abs/2203.12602

Other

1.35k stars 136 forks source link

Open Ha0Tang opened 1 year ago

Ha0Tang commented 1 year ago

The VideoMAE pre-trained models are trained on datasets of 16 frames per video. Can I fine-tine it on a video dataset of 32 frames?