bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.
Apache License 2.0
702 stars 180 forks source link

[FEATURE REQUEST] Support HumanEval+ tests for MultiPL-E #193

Open Randl opened 5 months ago

Randl commented 5 months ago

We've added the support for HumanEval+ in MultiPL-E some time ago: https://github.com/nuprl/MultiPL-E/blob/main/humaneval_plus/generate_data.py (the queries are not stored in the repo since they're too large) It would be nice to support it in evaluation harness too.