MCG-NJU / VideoMAE

[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
https://arxiv.org/abs/2203.12602
Other
1.38k stars 136 forks source link

What's the finetuning differences between ViT-B 80%acc and 81%acc? #100

Open Vickeyhw opened 1 year ago

Vickeyhw commented 1 year ago

In the MODEL_ZOO.md, the performance table of Kinetics-400, what improves the top-1 performance of ViT-B from 80% to 81.0%?