Ahnsun / merlin

[ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Minds
https://ahnsun.github.io/merlin/
Other
81 stars 0 forks source link

about training cost #2

Open cyj95 opened 3 months ago

cyj95 commented 3 months ago

what's the difference between Merlin and Merlin-Chat models? How long time does it cost to train the model? how many GPUs does the model need to train?

Ahnsun commented 3 months ago

Hi, Merlin is the pretrained weights and Merlin-Chat is the weights after SFT. The entire training process is conducted on 64 NVIDIA A800 GPUs, with approximately 12 hours required for pre-training and 3 hours for instruction-tuning. Detailed informantion will be updated in the next version of Merlin. Best