Hi,
Thank you for awesome work.
I got a problem about the batch size. To be specific, when pre-training VIT-B on K400, the script sets the batch_size as 32, which means 32 videos per GPU. If one video clip consists of 16 frames (as set), one gpu will need to process 32*16=512 frames.
Is this right? Or do I misunderstand something?
Another problem, your paper reports the batch_size as 1024 when pre-training VIT-B, which is inconsistent with your scripts here.
Hi, Thank you for awesome work. I got a problem about the batch size. To be specific, when pre-training VIT-B on K400, the script sets the batch_size as 32, which means 32 videos per GPU. If one video clip consists of 16 frames (as set), one gpu will need to process 32*16=512 frames. Is this right? Or do I misunderstand something? Another problem, your paper reports the batch_size as 1024 when pre-training VIT-B, which is inconsistent with your scripts here.