dvlab-research / LongLoRA

Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)
http://arxiv.org/abs/2309.12307
Apache License 2.0
2.63k stars 273 forks source link

the value of loss is too unstable when supervised-finetune the 7b-100k-ft model #168

Open seanxuu opened 9 months ago

seanxuu commented 9 months ago

when I use the LongAlpaca-12k dataset to supervised fintune the LongAlpaca-7B model, the value of loss is too unstable. my command is :

Miniconda/envs/longlora/bin/python -u supervised-fine-tune.py 
--model_name_or_path models/LongAlpaca-7B 
--bf16 True 
--output_dir LongLoRA/save/LongAlpaca-7B-origdata 
--model_max_length 32768 
--use_flash_attn True 
--data_path data/LongAlpaca-12k.json 
--low_rank_training True 
--num_train_epochs 3 
--per_device_train_batch_size 1
 --per_device_eval_batch_size 2 
--gradient_accumulation_steps 1 
--evaluation_strategy no 
--save_strategy steps 
--save_steps 1000 
--save_total_limit 2 
--learning_rate 2e-5 
--weight_decay 0.0 
--warmup_steps 20 
--lr_scheduler_type constant_with_warmup 
--logging_steps 1 
--deepspeed ds_configs/stage2.json 
--tf32 True

the value of loss looks like below:

image

seanxuu commented 9 months ago

I try to train Llama-2-7b-longlora-100k-ft with my own dataset which is sampled from your LongAlpaca-12k.json data. But the value of loss looks same. image

python supervised-fine-tune.py  \
        --model_name_or_path /models/Llama-2-7b-longlora-100k-ft \
        --bf16 True \
        --output_dir LongLoRA/save/7b-100k-ft-origdata-mydata       \
        --model_max_length 100000 \
        --use_flash_attn True \
        --data_path LongLoRA/pdf2txt/output/manual_data.json \
        --low_rank_training True \
        --num_train_epochs 5  \
        --per_device_train_batch_size 1     \
        --per_device_eval_batch_size 2     \
        --gradient_accumulation_steps 8     \
        --evaluation_strategy "no"     \
        --save_strategy "steps"     \
        --save_steps 98     \
        --save_total_limit 2     \
        --learning_rate 2e-5     \
        --weight_decay 0.0     \
        --warmup_steps 20     \
        --lr_scheduler_type "constant_with_warmup"     \
        --logging_steps 1     \
        --deepspeed "ds_configs/stage2.json" \
        --tf32 True