Closed yoon28 closed 2 years ago
Hi @yoon28! Thanks for your interest in our VideoMAE! We conduct the experiments with 64 GPUs. I think it's difficult to pre-train the ViTs with two 24Gb GPUs on ImageNet. It's even harder to pre-train the ViTs on video datasets.
Hi, thank you for the nice work.
I have a question. Does batch-size matter when I train a model? What if someone only has two 24Gb GPUs, then what is the good choice of batch-size for the case? Can I apply the equation appeared in Appendix B to compute the learning rate for small batch-size, such as 2 or 4, even 1?