chat模型比base模型在C-Eval上的指标更高？

QwenLM / Qwen2

Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud.

7.41k stars 449 forks source link

chat模型比base模型在C-Eval上的指标更高？ #849

Open Tramac opened 4 weeks ago

Tramac commented 4 weeks ago

base 模型：Qwen-1.8B chat 模型：Qwen-1.8B-Chat benckmark: C-Eval

结果：

# base 模型
        Average: 62.63
           STEM: 57.21
Social Sciences: 75.27
     Humanities: 64.20
          Other: 58.59

# chat 模型
        Average: 58.17
           STEM: 55.12
Social Sciences: 67.64
     Humanities: 62.65
          Other: 51.82

请问 Chat 模型相比 Base 模型在 C-Eval 上的指标略有降低，这种情况是正常的吗，如果正常原因是什么呢？

Alwin4Zhang commented 4 weeks ago

正常的