qwen-7b-chat、qwen1.5-7b-chat微调效果对比

sunyclj commented 5 months ago

qwen1.5-7b-chat微调： deepspeed --num_gpus 2 src/train_bash.py --deepspeed ./Qwen1.5/examples/sft/ds_config_zero2_new.json --stage sft --do_train --model_name_or_path './Qwen1.5-7B-Chat' --dataset huxijin_luxun_alpace --finetuning_type lora --lora_target q_proj,k_proj,v_proj,o_proj,up_proj,gate_proj,down_proj --output_dir qwen1.5_7b_chat_huxijin_luxun --overwrite_cache --num_train_epochs 60 --per_device_train_batch_size 2 --gradient_accumulation_steps 8 --save_strategy epoch --lr_scheduler_type cosine --save_steps 1000 --learning_rate 3e-4 --logging_strategy epoch --cutoff_len 1024 --weight_decay 0.1 --adam_beta2 0.95 --warmup_ratio 0.01 --plot_loss --fp16 --lora_rank 1 --lora_alpha 2 --lora_dropout 0.05 --template default qwen-7b-chat微调： deepspeed --num_gpus 2 src/train_bash.py --deepspeed ./Qwen1.5/examples/sft/ds_config_zero2_new.json --stage sft --do_train --model_name_or_path './Qwen-7B-Chat' --dataset huxijin_luxun_alpace --finetuning_type lora --lora_target c_attn,c_proj,w1,w2 --output_dir qwen1.0_7b_chat_huxijin_luxun1 --overwrite_cache --num_train_epochs 60 --per_device_train_batch_size 2 --gradient_accumulation_steps 8 --save_strategy epoch --lr_scheduler_type cosine --save_steps 1000 --learning_rate 3e-4 --logging_strategy epoch --cutoff_len 1024 --weight_decay 0.1 --adam_beta2 0.95 --warmup_ratio 0.01 --plot_loss --fp16 --lora_rank 1 --lora_alpha 2 --lora_dropout 0.05 --template default 参数设置一致，使用相同的训练数据，qwen1.5-7b-chat微调后的效果变差，请问应该从哪些方面分析呢？

Jarvanen commented 5 months ago

请问下你的数据量大约是多少需要训练60轮的嘛

sunyclj commented 5 months ago

请问下你的数据量大约是多少需要训练60轮的嘛

数据量不大，肯定是不需要训练60轮的，取epoch=3或5的权重测试，1.0的效果明显比1.5要好。

sunyclj commented 5 months ago

请问下你的数据量大约是多少需要训练60轮的嘛

相关性对比，1.0好，1.5部分测试输出与prompt完全不相关；文本重复上，1.0几乎没有，1.5就有很大的概率重复；

Jarvanen commented 5 months ago

请问下你的数据量大约是多少需要训练60轮的嘛

相关性对比，1.0好，1.5部分测试输出与prompt完全不相关；文本重复上，1.0几乎没有，1.5就有很大的概率重复；

我也遇到了同样的问题，我是14B之间的对比，1.0比1.5遵从提示词的效果要更好。另外我加了epoch降了lr加了batchsize后，1.0遵从提示词的效果也变好了，1.5还没尝试。

juemifuji commented 5 months ago

在1.5 chat上继续微调，会出现比较明显的灾难性遗忘。

sunyclj commented 5 months ago

请问下你的数据量大约是多少需要训练60轮的嘛

相关性对比，1.0好，1.5部分测试输出与prompt完全不相关；文本重复上，1.0几乎没有，1.5就有很大的概率重复；

我也遇到了同样的问题，我是14B之间的对比，1.0比1.5遵从提示词的效果要更好。另外我加了epoch降了lr加了batchsize后，1.0遵从提示词的效果也变好了，1.5还没尝试。

降低数据的复杂度，1.0和1.5的效果类似了，后续我再微调一组，进一步确认是不是数据复杂度的问题

sunyclj commented 5 months ago

在1.5 chat上继续微调，会出现比较明显的灾难性遗忘。

是的，还有文本重复以及没有结束符的问题；

wgimperial commented 5 months ago

请问下你的数据量大约是多少需要训练60轮的嘛

相关性对比，1.0好，1.5部分测试输出与prompt完全不相关；文本重复上，1.0几乎没有，1.5就有很大的概率重复；

我也遇到了同样的问题，我是14B之间的对比，1.0比1.5遵从提示词的效果要更好。另外我加了epoch降了lr加了batchsize后，1.0遵从提示词的效果也变好了，1.5还没尝试。

降低数据的复杂度，1.0和1.5的效果类似了，后续我再微调一组，进一步确认是不是数据复杂度的问题

遇到同样问题，请问降低数据复杂度具体怎么理解？可否提供一个示例。另外，您微调的数据量是多少？我用几百条标注数据验证的情况下，基本无效果。

zheganjue519 commented 5 months ago

我也遇到相同问题。1.5要比1.0效果差一些，相同数据做sft情况下。

jklj077 commented 3 months ago

If you were using examples/finetune.py in this repo, the issue that the finetuned model cannot generate the <|im_end|> token correctly has been fixed in the latest main. Please try using the latest code.
finetune.py in QwenLM/Qwen and examples/finetune.py in QwenLM/Qwen1.5 expect different data formats. In addition, the masking scheme is different. You may experience degraded result if the user part is of bad quality in your data. As stated in the README, we recommend using frameworks tailored for finetuning. The finetune.py script provided in this repo merely serve as a basic example for demonstration and customization.
In all case, you will need to adjust finetuning hyperparameters to achieve the best result, as Qwen and Qwen1.5 are different models and will have different training dynamics.

QwenLM / Qwen2

qwen-7b-chat、qwen1.5-7b-chat微调效果对比 #186