RUCAIBox / LLMBox

A comprehensive library for implementing LLMs, including a unified training pipeline and comprehensive model evaluation.
MIT License
566 stars 74 forks source link

eval log格式问题 #258

Closed xansar closed 2 months ago

xansar commented 2 months ago

我在ceval上进行测试发现,测试结果中记录的题干、选项存在问题: image

这可能与LLMBox/utilization/utils/log_results.py 中175行调用dump_conversations函数有关。 evaluation_instances = dump_conversations(evaluation_instances, local_model) 这里输入的evaluation_instances结构为List[tuple(str, str)],在dump_conversations中,每个tuple被强制转换为str,导致207行*source_texts, target_text = zip(*evaluation_instances)获取题干选项时出错。 将175行注释后,ceval的测试结果记录恢复正常,但尚不清楚在其他数据集上是否仍存在问题: image

huyiwen commented 2 months ago

感谢反馈,稍后修复一下!