[BUG] <title>lora微调loss异常? #1214

Closed estuday closed 1 week ago

estuday commented 2 months ago

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior


python finetune.py \
  --model_name_or_path $MODEL \
  --data_path $DATA \
  --fp16 True \
  --output_dir output_qwen \
  --num_train_epochs 1 \
  --per_device_train_batch_size 2 \
  --per_device_eval_batch_size 1 \
  --gradient_accumulation_steps 1 \
  --evaluation_strategy "no" \
  --save_strategy "steps" \
  --save_steps 1000 \
  --save_total_limit 10 \
  --learning_rate 3e-4 \
  --weight_decay 0.1 \
  --adam_beta2 0.95 \
  --warmup_ratio 0.01 \
  --lr_scheduler_type "cosine" \
  --logging_steps 1 \
  --report_to "none" \
  --model_max_length 512 \
  --lazy_preprocess True \
  --gradient_checkpointing \
  --use_lora \
  --q_lora \
  --deepspeed ./ds_config_zero2.json

使用的数据36000条左右,都是单轮对话,示例如下: {"id": "0", "conversations": [ {"from": "user", "value": "世界上最早的报纸诞生于"}, {"from": "assistant", "value": "中国。北宋末年(公元11,12世纪)出现的印刷报纸,不仅是中国新闻史上最早的印刷报纸,也是世界新闻史上最早的印刷报纸.中国新闻事业历史的悠久,内容的丰富,是任何西方国家都难以比肩的.<e>中国古代的报纸产生于中国的封建社会时期,是封建地主阶级及其政治代表占统治地位的封建自然经济通过新闻手段的反映.在漫长的封建社会时期,中国古代的报纸,不论是官方的邸报,还是民办的小报和京报,都必然要和当时的封建统治者保持一定的联系,受他们的制约.官方的邸报固然是封建统治阶级的喉舌和御用的宣传工具,民办的小报和京报也只能在封建统治阶级的控制下活动,不能越雷池一步.封建统治者绝不允许可以自由报道一切消息和自由发表一切意见的报纸存在.中国古代的报纸在为当时的读者提供朝野政治和社会信息方面确实起过一定的作用,但始终没有摆脱统治阶级的掌握.中国古代报纸的历史,基本上是一部封建统治阶级掌握传播媒介,控制舆论工具,限制言论出版自由的历史.<e>中国古代的邸报有1200年左右的历史.小报有近千年的历史.民间报房出版的邸报,京报有近400年的历史.它们从诞生到结束,持续的时间都不算短,但发展不快,形式内容的变化不大."}] } 在微调的过程中只用了几百step,loss基本就变得很小,在0左右: image

期望行为 | Expected Behavior

我期望它能够正确输出微调数据的微调内容。我原以为用的是公共数据集,可能在预训练阶段官方用过,所以loss会很小,但是我在合并权重后,测试了微调效果,发现结果很差,并且随着step的增加,模型输出的内容还会重复。 image

复现方法 | Steps To Reproduce

No response

运行环境 | Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):

备注 | Anything else?

No response

estuday commented 2 months ago

补充完成一轮训练后的输出情况: image

jklj077 commented 2 months ago

How did you conduct the inference?

estuday commented 2 months ago

How did you conduct the inference?


jklj077 commented 2 months ago

Please first try adjusting the repetition penalty (higher), the temperature (higher), and the top_p (higher) in the generation_config.json.

estuday commented 2 months ago

Please first try adjusting the repetition penalty (higher), the temperature (higher), and the top_p (higher) in the generation_config.json.

Hi, I adjusted these parameters and the effect seemed to be better, but the model didn't seem to stop in time and kept reasoning, which is more like the case of basemodel than chatmodel image

github-actions[bot] commented 1 month ago

jklj077 commented 1 month ago


It appears that the something is wrong with the stopping criteria. Normally, model.chat does that for you, but it may be worth doube check. If you are using transformers, you need to adjust the generation_config.json and see if the eos_token is set properly (it should be <|im_end|> 151645 and <|endoftext|> 151643).

I would advise you to migrate to Qwen1.5, though, as Qwen1.0 and its code is not actively maintained.

github-actions[bot] commented 2 weeks ago

