THUDM / ChatGLM2-6B

ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型
Other
15.73k stars 1.85k forks source link

[BUG/Help] detail about model and prompt on C-eval final submit #539

Open trangtv57 opened 1 year ago

trangtv57 commented 1 year ago

Is there an existing issue for this?

Current Behavior

I have saw the chatglm2 benchmark in c-eval leaderboard that have score avg: 71 While the c-eval score report in readme in version zeroshort just max is version chatglm12B: 61 So I'm not sure that chatglm-12B with fewshot can be improve from 61->71, or another model, and prompt engineering, Can you give me the detail?