training time per epoch

LeapLabTHU / MLLA

Official repository of MLLA (NeurIPS 2024)

179 stars 6 forks source link

training time per epoch #3

Closed ShixuanGu closed 3 months ago

ShixuanGu commented 3 months ago

Could you kindly report the average training time per ep with 8 a100?

Somehow my training time/ep w. 4a100 was round 20min, and increases to 27min after 80eps.

tian-qing001 commented 3 months ago

Hi @ShixuanGu, thanks for your attention to our work. We train our models using 32 a100 with a total batch size of $128\times32=4096$. It takes around 3-4 minutes to train MLLA-T for one epoch. You may consider turn on --amp and increase batch size to speed up training.

ShixuanGu commented 3 months ago

Many thanks for the clarification. It seems the 4096 batch size is 4 times larger than VMamba settings (1024 in their case), is it possible to release the 1024 batch size result for a comprehensive comparison? (32*a100 with a batch size 4096 is a little too luxury :) )

tian-qing001 commented 3 months ago

Hi @ShixuanGu, 4096 batch size is a commonly adopted hyper-parameter employed by many classical works, such as ViT, ConvNext, and etc. I think batch size has little impact on model's performance and we may provide 1024 batch size result in the future.