Hi, for the same size ViT-variant we don't want to offer too many options, which might confuse our users.
You could perhaps use internvideo's model. This model was pre-trained for 800 epochs on the same UnlabeledHybrid dataset without dual masking and is close to the performance of our VideoMAE V2-B which was pre-trained for 1200 epochs with dual masking.
Hi, for the same size ViT-variant we don't want to offer too many options, which might confuse our users.
You could perhaps use internvideo's model. This model was pre-trained for 800 epochs on the same UnlabeledHybrid dataset without dual masking and is close to the performance of our VideoMAE V2-B which was pre-trained for 1200 epochs with dual masking.