Model evaluation results reported in paper/blog

QwenLM / Qwen2.5-Coder

Qwen2.5-Coder is the code version of Qwen2.5, the large language model series developed by Qwen team, Alibaba Cloud.

3.1k stars 210 forks source link

Model evaluation results reported in paper/blog #98

Closed ll-antn closed 2 months ago

ll-antn commented 2 months ago

In your blog and paper you reported results which show 88.4 score of 7B instruction tuned model on HumanEval benchmark. However evaluation/eval_plus/released/results/humaneval/codeqwen_chat.txtshows only 0.835, and exact same number I get when running evaluation locally. Could you please elaborate on that and help me reproduce the reported numbers?

cyente commented 2 months ago

hi， the result file in codeqwen_chat.txt is still the old version of codeqwen1.5, you may also use the wrong model?