resources required for training with 72b language models

LLaVA-VL / LLaVA-NeXT

Apache License 2.0

2.98k stars 255 forks source link

resources required for training with 72b language models #175

Open annopackage opened 3 months ago

annopackage commented 3 months ago

Hi, thanks for your great work. I was wondering the how many gpus are needed to training llava-next with 72b llm.

Luodian commented 3 months ago

minimal 128 gpus with 80g each (h100/800 or a100/800).

NicoZenith commented 2 months ago

Thanks for your answer! However, I guess we will also need model parallelisation even with 80G GPUs. This is not set in the training code, should we add device_map='auto' when loading the model?

Luodian commented 2 months ago

Thanks for your answer! However, I guess we will also need model parallelisation even with 80G GPUs. This is not set in the training code, should we add device_map='auto' when loading the model?

You dont need to set this, since deepspeed zero3 already handled this. You could see such args in training scripts.