bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.
Apache License 2.0
702 stars 180 forks source link

Supporting HumanEval+ dataset #186

Closed ganler closed 5 months ago

ganler commented 5 months ago

Hi, we have just uploaded HumanEval+ to hugging face: https://huggingface.co/datasets/evalplus/humanevalplus in a format that is 100% compatible with the original HumanEval. I am thinking of having this supported in bigcode-evaluation-harness as well due to the HumanEval-compatible format.

# Load from huggingface
from datasets import load_dataset
load_dataset("evalplus/humanevalplus")

image

We also manually tested the validity and got the exact scores compared to using evalplus:

image

A few notes regarding integration:

We are also going to support MBPP+ in a compatible format soon.

loubnabnl commented 5 months ago

PR merged, thanks for the integration! numpy is already a dependency of datasets so it should be in the environment.