评估instruct模型的代码 humaneval 除了python 其他都有问题，跑出来分都为0

deepseek-ai / DeepSeek-Coder

DeepSeek Coder: Let the Code Write Itself

https://coder.deepseek.com/

MIT License

6.01k stars 433 forks source link

评估instruct模型的代码 humaneval 除了python 其他都有问题，跑出来分都为0 #82

Closed Nightbringers closed 6 months ago

Nightbringers commented 6 months ago

可否给一个其他语言的脚本例子？我跑python语言的得分是正常的，但是把脚本里的语言换成其他语言比如java就会出现0分的情况。

DejianYang commented 6 months ago

可以分享下测试的脚本吗，模型的输出会保留到一个文件中，然后运行评测，请先检查下模型的输出是否正常？

Nightbringers commented 6 months ago

LANG="java" OUPUT_DIR="output" MODEL="deepseek-coder-6.7b-instruct"

CUDA_VISIBLE_DEVICES=0 python eval_instruct.py \ --model "deepseek-coder-6.7b-instruct-chat" \ --output_path "$OUPUT_DIR/${LANG}.$MODEL.jsonl" \ --language $LANG \ --temp_dir $OUPUT_DIR

你好，测试脚本是这样写的，看看有什么问题呢，就是语言那里写python是可以的，换成其他语言就是0分

DejianYang commented 6 months ago

deepseek-coder-6.7b-instruct-chat 是我们的模型吗？检查了模型的输出文件，结果正常吗？

Nightbringers commented 6 months ago

对是你们的chat模型，看了模型的输出文件，看起来挺正常的。这个脚本写的有问题吗？

我看介绍里有这么一句话： Additionally, for various programming languages, the execution path may differ. Please ensure you update the appropriate paths in the humaneval/execution.py file accordingly. 是哪里需要修改吗

DejianYang commented 6 months ago

对是你们的chat模型，看了模型的输出文件，看起来挺正常的。这个脚本写的有问题吗？

我看介绍里有这么一句话： Additionally, for various programming languages, the execution path may differ. Please ensure you update the appropriate paths in the humaneval/execution.py file accordingly. 是哪里需要修改吗

每种语言需要安装对应的编译器并添加到环境变量中，不然需要通过execution.py 指定对应的目录。你可以在Debug下错误信息就知道是否是测评的问题。

Nightbringers commented 6 months ago

请问怎么通过execution.py 指定对应的目录呢？可否给一个示例

我深入测试了一下打印是这个错误 failed: compilation error

DejianYang commented 6 months ago

请问怎么通过execution.py 指定对应的目录呢？可否给一个示例

我深入测试了一下打印是这个错误 failed: compilation error https://github.com/deepseek-ai/DeepSeek-Coder/blob/791c8e2c2c5f89032041010efa60776eb4306d58/Evaluation/HumanEval/human_eval/execution.py#L16C1-L21C13 例如node_exec = "/home/user/node/bin/"