Closed RaymondHQR closed 5 months ago
Hi! Your work is excellent, I want to use it to optimize the inference of LLaMA2-70B, I have 4 servers with 2 A100(80G) on each server, and I want to use model parallel and tensor parallel together to make the inference faster, can you help me ? :)
Can you be more specific about what you're trying to do? I think Megatron generally supports model and tensor parallelism? Did you look at https://epfllm.github.io/Megatron-LLM/guide/getting_started.html?
Hi! Your work is excellent, I want to use it to optimize the inference of LLaMA2-70B, I have 4 servers with 2 A100(80G) on each server, and I want to use model parallel and tensor parallel together to make the inference faster, can you help me ? :)