bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.
Apache License 2.0
825 stars 219 forks source link

Multiple-E Go test file name suffix does not contain _test.go #224

Open hitesh-1997 opened 7 months ago

hitesh-1997 commented 7 months ago

Hi Team, I was using the bigcode-evaluation-harness to evaluate generation for go on Multiple-E dataset and found that, all the evaluation had output ? command-line-arguments [no test files] although status_code = 0. On debugging further, it looks like we set self.language here instead of prompt_name['langugage'] in the problem dict to process execution downstream, and when language is checked in evaluators here, it is appended without _test.go suffix leading to non detecting any test files.

To make it easy to repro this, I have added a video below which evaluate one go generation test case (used deepseek coder to generate this)

generations_go_example.json

[
    [
        "package strlen_test\n\nimport (\n    \"testing\"\n    \"fmt\"\n)\n\n// Return length of given string\n// >>> strlen(\"\")\n// 0\n// >>> strlen(\"abc\")\n// 3\nfunc strlen(myString string) int {\n    return len(myString)\n}\n"
    ]
]

https://github.com/bigcode-project/bigcode-evaluation-harness/assets/20701220/c57dd498-b7f8-488a-a842-f9eb405f1f0d