Closed ShixuanGu closed 3 months ago
Hi @ShixuanGu, thanks for your attention to our work. We train our models using 32 a100 with a total batch size of $128\times32=4096$. It takes around 3-4 minutes to train MLLA-T for one epoch. You may consider turn on --amp
and increase batch size to speed up training.
Many thanks for the clarification. It seems the 4096 batch size is 4 times larger than VMamba settings (1024 in their case), is it possible to release the 1024 batch size result for a comprehensive comparison? (32*a100 with a batch size 4096 is a little too luxury :) )
Hi @ShixuanGu, 4096 batch size is a commonly adopted hyper-parameter employed by many classical works, such as ViT, ConvNext, and etc. I think batch size has little impact on model's performance and we may provide 1024 batch size result in the future.
Could you kindly report the average training time per ep with 8 a100?
Somehow my training time/ep w. 4a100 was round 20min, and increases to 27min after 80eps.