OpenGVLab / VideoMamba

VideoMamba: State Space Model for Efficient Video Understanding
https://arxiv.org/abs/2403.06977
Apache License 2.0
660 stars 47 forks source link

GPU usage during pre-training and fine-tuning phases. #24

Closed XuecWu closed 2 months ago

XuecWu commented 2 months ago

Thank you for your great work! As described above, I want to know the GPU usage during pre-training and fine-tuning, especially in fine-tuning. The 64 A100 GPUs used in the pre-training stage mentioned in the paper are too high for me. Looking forward to your reply.

Andy1621 commented 2 months ago

Hi! For pretraining, since it usually requires longer epochs, like 200 epochs for K400 and 10 epochs for 25M data, we use 32 A100s for VideoMamba-M. For fine-tuning, we use 16 A100s, but more GPU for large resolution for fast training.

BTW, all the pretraining models are released, and we hope researchers can fine-tune our checkpoint for downstream tasks or other domains.

XuecWu commented 2 months ago

got it. Thank you for your reply!