Output is not getting saved

I have tried finetuning LLaMa 30B on an A100 with 2 GPUs with 80 GB each. The script got completed running in 5 min and there is no output generated. I couldn't find any error as well.

The command used to run the script:

deepspeed --include A1:0,1 --master_port 22384 train.py --output_dir output --init_ckpt /root/llama-30b-init-ckpt/ --data_path /root/alpaca_deepspeed.json --max_seq_len 1024 --train_steps 1000 --eval_steps 10 --save_steps 200 --log_steps 1 --pipe_parallel_size 2 --model_parallel_size 1 --use_flash_attn true --deepspeed_config ./configs/ds_config_zero1.json

HuangLK / transpeeder

Output is not getting saved #29