SmartFlowAI / Llama3-Tutorial

Llama3-Tutorial(XTuner、LMDeploy、OpenCompass)
469 stars 48 forks source link

How to deploy and fine-tune llama3 on a multi-graphics machine #12

Open AllYoung opened 4 months ago

AllYoung commented 4 months ago

Does llama3 support inference and fine-tuning on multi-graphics card machines? Could you please add some sample code for a single machine with multiple cards?

fanqiNO1 commented 4 months ago

For deployment, you can refer to https://lmdeploy.readthedocs.io/en/latest/get_started.html#serving , and specify the arg tp

Take api server as an example, https://lmdeploy.readthedocs.io/en/latest/serving/api_server.html

lmdeploy serve api_server internlm/internlm2-chat-7b --server-port 23333 --tp 2

When you specifty the tp is 2, the weight of the model will be partitioned to 2 cards.

For fine-tuning, you can refer to https://github.com/InternLM/xtuner/tree/main?tab=readme-ov-file#fine-tune-

For example,

(DIST) NPROC_PER_NODE=${GPU_NUM} xtuner train internlm2_chat_7b_qlora_oasst1_e3 --deepspeed deepspeed_zero2
(SLURM) srun ${SRUN_ARGS} xtuner train internlm2_chat_7b_qlora_oasst1_e3 --launcher slurm --deepspeed deepspeed_zero2