QwenLM / Qwen2

Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud.
7.41k stars 449 forks source link

chat模型比base模型在C-Eval上的指标更高? #849

Open Tramac opened 4 weeks ago

Tramac commented 4 weeks ago

base 模型:Qwen-1.8B chat 模型:Qwen-1.8B-Chat benckmark: C-Eval

结果:

# base 模型
        Average: 62.63
           STEM: 57.21
Social Sciences: 75.27
     Humanities: 64.20
          Other: 58.59

# chat 模型
        Average: 58.17
           STEM: 55.12
Social Sciences: 67.64
     Humanities: 62.65
          Other: 51.82

请问 Chat 模型相比 Base 模型在 C-Eval 上的指标略有降低,这种情况是正常的吗,如果正常原因是什么呢?

Alwin4Zhang commented 4 weeks ago

正常的