[FEATURE REQUEST] Support HumanEval+ tests for MultiPL-E

bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.

Apache License 2.0

830 stars 219 forks source link

Open Randl opened 10 months ago

Randl commented 10 months ago

We've added the support for HumanEval+ in MultiPL-E some time ago: https://github.com/nuprl/MultiPL-E/blob/main/humaneval_plus/generate_data.py (the queries are not stored in the repo since they're too large) It would be nice to support it in evaluation harness too.