Closed SefaZeng closed 2 months ago
https://evalplus.github.io/leaderboard.html https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard
Please refer to some third-party benchmarks, similar to the ones mentioned above, to check for any differences.
https://evalplus.github.io/leaderboard.html https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard
Please refer to some third-party benchmarks, similar to the ones mentioned above, to check for any differences.
Hi, could this repo be used to evaluate the Qwen1.5 models?
We recommend you use qwen2. We have used and tested the evaluation on qwen2 and can basically confirm that everything is aligned correctly.
直接用 Qwen2-1.5B,测试出来结果比技术报告的结果要低 10 个点,而且 Qwen1.5 的测试结果也很低。