Closed kartikzheng closed 3 weeks ago
对Qwen2.5-Coder-7B-Instruct进行FIM santacoder数据集的评测,发现相比humaneval数据集评测,低了有10个百分点左右,特别是python语言,pass@1只有50%左右。而对比其他代码大模型,pass@1并没有明显的下滑。
Could you please provide us with the specific evaluation script and the prompts that have been tested? We will check it out.
closed for no reply
对Qwen2.5-Coder-7B-Instruct进行FIM santacoder数据集的评测,发现相比humaneval数据集评测,低了有10个百分点左右,特别是python语言,pass@1只有50%左右。而对比其他代码大模型,pass@1并没有明显的下滑。