baichuan-inc / Baichuan-13B

A 13B large language model developed by Baichuan Intelligent Technology
https://huggingface.co/baichuan-inc/Baichuan-13B-Chat
Apache License 2.0
2.98k stars 236 forks source link

baichuan-13b-chat sft微调loss不下降 #188

Open xiaohuihwh opened 1 year ago

xiaohuihwh commented 1 year ago

用LLaMA-Efficient-Tuning微调1和2,loss均不下降

xiaohuihwh commented 1 year ago

在2块4090用deepspeed微调baichuan1和baichuan2开始loss为1.9,几十个step之后就在1.5附近不再下降。 参数如下。 deepspeed --num_gpus 2 --master_port=9901 src/train_bash.py --deepspeed ds_config_stage3.json --stage sft --model_name_or_path /root/autodl-tmp/baichuan-13b-chat/ --do_train --dataset alpaca_gpt4_zh --template baichuan2 --finetuning_type lora --lora_target W_pack --output_dir src/output/sft-0921 --overwrite_cache --per_device_train_batch_size 4 --gradient_accumulation_steps 4 --lr_scheduler_type cosine --logging_steps 10 --save_steps 500 --learning_rate 5e-5 --num_train_epochs 3.0 --plot_loss --bf16 deepspeed配置文件如下。 image