How to deploy and fine-tune llama3 on a multi-graphics machine

For deployment, you can refer to https://lmdeploy.readthedocs.io/en/latest/get_started.html#serving , and specify the arg tp

Take api server as an example, https://lmdeploy.readthedocs.io/en/latest/serving/api_server.html

lmdeploy serve api_server internlm/internlm2-chat-7b --server-port 23333 --tp 2

When you specifty the tp is 2, the weight of the model will be partitioned to 2 cards.

For fine-tuning, you can refer to https://github.com/InternLM/xtuner/tree/main?tab=readme-ov-file#fine-tune-

For example,

(DIST) NPROC_PER_NODE=${GPU_NUM} xtuner train internlm2_chat_7b_qlora_oasst1_e3 --deepspeed deepspeed_zero2
(SLURM) srun ${SRUN_ARGS} xtuner train internlm2_chat_7b_qlora_oasst1_e3 --launcher slurm --deepspeed deepspeed_zero2

SmartFlowAI / Llama3-Tutorial

How to deploy and fine-tune llama3 on a multi-graphics machine #12