Open dittops opened 1 year ago
I have tried finetuning LLaMa 30B on an A100 with 2 GPUs with 80 GB each. The script got completed running in 5 min and there is no output generated. I couldn't find any error as well.
The command used to run the script:
deepspeed --include A1:0,1 --master_port 22384 train.py --output_dir output --init_ckpt /root/llama-30b-init-ckpt/ --data_path /root/alpaca_deepspeed.json --max_seq_len 1024 --train_steps 1000 --eval_steps 10 --save_steps 200 --log_steps 1 --pipe_parallel_size 2 --model_parallel_size 1 --use_flash_attn true --deepspeed_config ./configs/ds_config_zero1.json
It might be that the training is too slow, you can check the GPU usage.
I have tried finetuning LLaMa 30B on an A100 with 2 GPUs with 80 GB each. The script got completed running in 5 min and there is no output generated. I couldn't find any error as well.
The command used to run the script:
deepspeed --include A1:0,1 --master_port 22384 train.py --output_dir output --init_ckpt /root/llama-30b-init-ckpt/ --data_path /root/alpaca_deepspeed.json --max_seq_len 1024 --train_steps 1000 --eval_steps 10 --save_steps 200 --log_steps 1 --pipe_parallel_size 2 --model_parallel_size 1 --use_flash_attn true --deepspeed_config ./configs/ds_config_zero1.json