为什么这个模型在相关测试集上的评测结果比其它开源中文模型低很多

TigerResearch / TigerBot

TigerBot: A multi-language multi-task LLM

https://www.tigerbot.com

Apache License 2.0

2.24k stars 194 forks source link

Closed ray075hl closed 9 months ago

ray075hl commented 9 months ago

https://qwenlm.github.io/zh/blog/qwen1.5/ 作者您好，我看到qwen1.5-7B在c-eval上的得分是74.1，而TigerBot 70B在c-eval上只有60.04。这是为什么呢？

chentigerye commented 9 months ago

大模型评测本身是个开放问题，我们更关注应用效果。

ray075hl commented 9 months ago

@chentigerye 感谢大佬的回复，也就是说我们应该测评模型的应用场景对吧。