the value of loss is too unstable when supervised-finetune the 7b-100k-ft model

dvlab-research / LongLoRA

Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)

Apache License 2.0

2.63k stars 273 forks source link

when I use the LongAlpaca-12k dataset to supervised fintune the LongAlpaca-7B model, the value of loss is too unstable. my command is :

Miniconda/envs/longlora/bin/python -u supervised-fine-tune.py 
--model_name_or_path models/LongAlpaca-7B 
--bf16 True 
--output_dir LongLoRA/save/LongAlpaca-7B-origdata 
--model_max_length 32768 
--use_flash_attn True 
--data_path data/LongAlpaca-12k.json 
--low_rank_training True 
--num_train_epochs 3 
--per_device_train_batch_size 1
 --per_device_eval_batch_size 2 
--gradient_accumulation_steps 1 
--evaluation_strategy no 
--save_strategy steps 
--save_steps 1000 
--save_total_limit 2 
--learning_rate 2e-5 
--weight_decay 0.0 
--warmup_steps 20 
--lr_scheduler_type constant_with_warmup 
--logging_steps 1 
--deepspeed ds_configs/stage2.json 
--tf32 True

the value of loss looks like below:

python supervised-fine-tune.py \ --model_name_or_path /models/Llama-2-7b-longlora-100k-ft \ --bf16 True \ --output_dir LongLoRA/save/7b-100k-ft-origdata-mydata \ --model_max_length 100000 \ --use_flash_attn True \ --data_path LongLoRA/pdf2txt/output/manual_data.json \ --low_rank_training True \ --num_train_epochs 5 \ --per_device_train_batch_size 1 \ --per_device_eval_batch_size 2 \ --gradient_accumulation_steps 8 \ --evaluation_strategy "no" \ --save_strategy "steps" \ --save_steps 98 \ --save_total_limit 2 \ --learning_rate 2e-5 \ --weight_decay 0.0 \ --warmup_steps 20 \ --lr_scheduler_type "constant_with_warmup" \ --logging_steps 1 \ --deepspeed "ds_configs/stage2.json" \ --tf32 True

dvlab-research / LongLoRA

the value of loss is too unstable when supervised-finetune the 7b-100k-ft model #168