[Feature] Evaluate model with 2gpu per process.

tzjtatata commented 7 months ago

Thank you for your great work. I want to evaluate llava-next 34b, but I only have A100 40B which is not enough for inference 34b in only 1gpu. Can you give me some advices?

kcz358 commented 7 months ago

Hi, @tzjtatata

Thank you for your interest in our work.

To evaluate a 34b model, you can use tensor parallel to achieve this on 2 A100 40G.

Use llava 1.6 34b as an example. You can pass --model_args "pretrained=liuhaotian/llava-v1.6-34b,conv_template=mistral_direct,device_map=auto and start your job without using accelerate launch.

tzjtatata commented 7 months ago

Thank you for careful reply. Can I use accelerate with 4 process x 2gpu？

---Original--- From: @.> Date: Fri, Mar 15, 2024 20:42 PM To: @.>; Cc: @.**@.>; Subject: Re: [EvolvingLMMs-Lab/lmms-eval] [Feature] Evaluate model with 2gpuper process. (Issue #12)

Hi, @tzjtatata

Thank you for your interest in our work.

To evaluate a 34b model, you can use tensor parallel to achieve this on 2 A100 40G.

Use llava 1.6 34b as an example. You can pass --model_args "pretrained=liuhaotian/llava-v1.6-34b,conv_template=mistral_direct,device_map=auto and start your job without using accelerate launch.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

kcz358 commented 7 months ago

@tzjtatata , unfortunately it might not be possible to do so since currently you have to make sure num process to be 1 do activate tensor parallel. We will try to work on features that can support batch size is larger than 1 in the future.

tzjtatata commented 7 months ago

Thank you!

---Original--- From: @.> Date: Sat, Mar 16, 2024 10:17 AM To: @.>; Cc: @.**@.>; Subject: Re: [EvolvingLMMs-Lab/lmms-eval] [Feature] Evaluate model with 2gpuper process. (Issue #12)

@tzjtatata , unfortunately it might not be possible to do so since currently you have to make sure num process to be 1 do activate tensor parallel. We will try to work on features that can support batch size is larger than 1 in the future.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

Luodian commented 7 months ago

Hi, @tzjtatata

Thank you for your interest in our work.

To evaluate a 34b model, you can use tensor parallel to achieve this on 2 A100 40G.

Use llava 1.6 34b as an example. You can pass --model_args "pretrained=liuhaotian/llava-v1.6-34b,conv_template=mistral_direct,device_map=auto and start your job without using accelerate launch.

https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/4

Yes, it's addressed in this PR.

EvolvingLMMs-Lab / lmms-eval

[Feature] Evaluate model with 2gpu per process. #12