MCG-NJU / VideoMAE

[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
https://arxiv.org/abs/2203.12602
Other
1.37k stars 136 forks source link

Pretraining time #5

Closed jzhang38 closed 2 years ago

jzhang38 commented 2 years ago

Hi, thanks for your solid works!

Would you mind sharing some pretraining time statistics? Like how long does it take to pretrain on Kinetics400 using 64 V100s.

yztongzhan commented 2 years ago

Hi @jzhang38! Thanks for your question! It takes about 27 hours to pre-train our VideoMAE (ViT-B) for 800 epochs on Kinetics-400 using 64 Tesla V100 GPUs. If you find the GPU-Util is not high enough, please reduce the batch-size to alleviate the pressure of your CPUs and I/O. Set --batch_size to 16 can also give favorable results.