微调以后同一个checkpoint在evaluate模式和部署模式下，同一份验证集的效果相差非常大 - Githubissues

THUDM / ChatGLM-6B

ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型

Apache License 2.0

39.96k stars 5.15k forks source link

微调以后同一个checkpoint在evaluate模式和部署模式下，同一份验证集的效果相差非常大 #1437

Open BirderEric opened 6 months ago

BirderEric commented 6 months ago

Is there an existing issue for this?

[X] I have searched the existing issues

Current Behavior

微调以后同一份验证集和同一个checkpoint，通过evalute脚本predict出来的结果跟通过部署方式predict出来的结果相差非常大，准确率分别为83%，39%，有大佬们遇到相同的情况吗？

Expected Behavior

No response

Steps To Reproduce

ptuning with my own train.json
predict dev.json with evaluate.sh using checpoint
predict dev.json with model.chat function using the same chpoint
different result and precision

Environment

- OS: Ubuntu 20.04
- Python:3.9
- Transformers:4.33.1
- PyTorch:2.0.1
- CUDA Support：true

Anything else?

No response