MCG-NJU / VideoMAE

[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
https://arxiv.org/abs/2203.12602
Other
1.37k stars 136 forks source link

About the batch_size #14

Closed LGYoung closed 2 years ago

LGYoung commented 2 years ago

Hi, Thank you for awesome work. I got a problem about the batch size. To be specific, when pre-training VIT-B on K400, the script sets the batch_size as 32, which means 32 videos per GPU. If one video clip consists of 16 frames (as set), one gpu will need to process 32*16=512 frames. Is this right? Or do I misunderstand something? Another problem, your paper reports the batch_size as 1024 when pre-training VIT-B, which is inconsistent with your scripts here.