When conducting SFT experiments, setting batch_size_train to 1 or 2 has the same memory usage.

RenShuhuai-Andy / TimeChat

[CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding

https://arxiv.org/abs/2312.02051

BSD 3-Clause "New" or "Revised" License

267 stars 23 forks source link

When conducting SFT experiments, setting batch_size_train to 1 or 2 has the same memory usage. #27

Open tiesanguaixia opened 4 months ago

tiesanguaixia commented 4 months ago

Thank you for your excellent paper and open source code. I would like to ask when using 4 * V100 GPU for instruction tuning on the TimeChat model, I set world_size==4 and accum_grad_iters==8 unchanged, but when batch_size_train is set to 1 or 2, the memory usage seems to be the same, all almost filling up the memory of every V100 GPU. What is the reason for this? Thank you a lot!