why use srun command for mixtral-8x7B full finetune??

Alpha-VLLM / LLaMA2-Accessory

An Open-source Toolkit for LLM Development

https://llama2-accessory.readthedocs.io/

Other

2.72k stars 176 forks source link

why use srun command for mixtral-8x7B full finetune?? #138

Closed hegang1-tal closed 10 months ago

hegang1-tal commented 10 months ago

When I tried to run full finetune mixtral-8x7B sparse model，I found that srun command is needed. After installing slum-client using "apt-get install slurm-client", and then run srun xxxxx, it failed. does srun command is must for full finetuning mixtral-8x7B model??

ChrisLiu6 commented 10 months ago

SLURM is not important, what is indeed important is that 8 GPUs are generally insufficient to full finetune mixtral 8x7B, so multiple GPU nodes are needed, and SLURM is the typical tool to manage such clusters. If you only have 8 GPUs, you may try the peft setting and use torchrun instead of srun for launching experiments; If you have access to a GPU cluster but it is not managed by SLURM, you may replace srun with platform-specific commands with similar functions.

hegang1-tal commented 10 months ago

thank u! Now I can run full SFT finetuning with torchrun and 4 x 8 A100 80G GPUs.

bao-xiaoyi commented 10 months ago

thank u! Now I can run full SFT finetuning with torchrun and 4 x 8 A100 80G GPUs.

能提供一下系统设置吗？我大约一轮快训完的时候会卡住，显存利用率和网卡通信会掉到0。