Closed hegang1-tal closed 10 months ago
SLURM is not important, what is indeed important is that 8 GPUs are generally insufficient to full finetune mixtral 8x7B, so multiple GPU nodes are needed, and SLURM is the typical tool to manage such clusters. If you only have 8 GPUs, you may try the peft setting and use torchrun instead of srun for launching experiments; If you have access to a GPU cluster but it is not managed by SLURM, you may replace srun with platform-specific commands with similar functions.
thank u! Now I can run full SFT finetuning with torchrun and 4 x 8 A100 80G GPUs.
thank u! Now I can run full SFT finetuning with torchrun and 4 x 8 A100 80G GPUs.
能提供一下系统设置吗?我大约一轮快训完的时候会卡住,显存利用率和网卡通信会掉到0。
When I tried to run full finetune mixtral-8x7B sparse model,I found that srun command is needed. After installing slum-client using "apt-get install slurm-client", and then run srun xxxxx, it failed. does srun command is must for full finetuning mixtral-8x7B model??