epfLLM / Megatron-LLM

distributed trainer for LLMs
Other
504 stars 73 forks source link

LLaMA2-70B Inference Optmization #92

Closed RaymondHQR closed 5 months ago

RaymondHQR commented 6 months ago

Hi! Your work is excellent, I want to use it to optimize the inference of LLaMA2-70B, I have 4 servers with 2 A100(80G) on each server, and I want to use model parallel and tensor parallel together to make the inference faster, can you help me ? :)

kylematoba commented 6 months ago

Can you be more specific about what you're trying to do? I think Megatron generally supports model and tensor parallelism? Did you look at https://epfllm.github.io/Megatron-LLM/guide/getting_started.html?