OpenMOSS / AnyGPT

Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"
629 stars 43 forks source link

how many a100 it cost in training and if i want to train on v100 what is the number needed ? #4

Closed Yang-bug-star closed 3 months ago

JunZhan2000 commented 3 months ago

In the pre-training phase, we used 96 A100 GPUs, with 4500 tokens per gpu, set the gradient accumulation steps to 8, and trained for 81k steps. The training took about one week. The computational resources required for the instruction tuning phase are much less.

PyChaser commented 2 months ago

Could you please specify whether you used the 40GB or 80GB version of the A100s to train your model?

JunZhan2000 commented 2 months ago

Could you please specify whether you used the 40GB or 80GB version of the A100s to train your model?

80G