[BUG/Help] detail about model and prompt on C-eval final submit

Is there an existing issue for this?

[X] I have searched the existing issues

Current Behavior

I have saw the chatglm2 benchmark in c-eval leaderboard that have score avg: 71 While the c-eval score report in readme in version zeroshort just max is version chatglm12B: 61 So I'm not sure that chatglm-12B with fewshot can be improve from 61->71, or another model, and prompt engineering, Can you give me the detail?

THUDM / ChatGLM2-6B

[BUG/Help] detail about model and prompt on C-eval final submit #539

Is there an existing issue for this?

Current Behavior