NVlabs / EAGLE

EAGLE: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
https://arxiv.org/pdf/2408.15998
Apache License 2.0
543 stars 45 forks source link

Training Time #4

Closed HashmatShadab closed 2 months ago

HashmatShadab commented 2 months ago

Thanks for sharing your work! Can you please share more details regarding training time for each stage with the resources you have used?

flyinglynx commented 2 months ago

Thank you for your interest!

We use 32 A100 GPUs for training. You can reduce the GPU requirements by using gradient accumulation and DeepSpeed ZeRO-3, though this will increase the training time.

In our experience, pretraining takes approximately 2 hours for 0.6 million samples. SFT requires about 24 hours for a 7B model and around 32 hours for a 13B model on 1.8 million samples.

HashmatShadab commented 2 months ago

Thanks for sharing the information. The A100 GPU is the one with 80GB memory, right?

Chrisding commented 2 months ago

Yes, that is correct.