bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.
Apache License 2.0
702 stars 180 forks source link

Add humaneval+ evaluation task #187

Closed ganler closed 5 months ago

ganler commented 5 months ago

This PR addresses #186 to include HumanEval+ in bigcode-evaluation-harness.

The dataset is available at https://huggingface.co/datasets/evalplus/humanevalplus

ganler commented 5 months ago

Thanks for the prompt code review! The comments have been addressed. :D

ganler commented 5 months ago

I also did a test using starcoderbase-1b and can get 11.5 pass@1 which is very close to what I previously got using EvalPlus :).

image