Open annopackage opened 3 months ago
minimal 128 gpus with 80g each (h100/800 or a100/800).
Thanks for your answer! However, I guess we will also need model parallelisation even with 80G GPUs. This is not set in the training code, should we add device_map='auto'
when loading the model?
Thanks for your answer! However, I guess we will also need model parallelisation even with 80G GPUs. This is not set in the training code, should we add
device_map='auto'
when loading the model?
You dont need to set this, since deepspeed zero3 already handled this. You could see such args in training scripts.
Hi, thanks for your great work. I was wondering the how many gpus are needed to training llava-next with 72b llm.