Add humaneval+ evaluation task

bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.

Apache License 2.0

702 stars 180 forks source link

Closed ganler closed 5 months ago

ganler commented 5 months ago

This PR addresses #186 to include HumanEval+ in bigcode-evaluation-harness.

ganler commented 5 months ago

Thanks for the prompt code review! The comments have been addressed. :D

ganler commented 5 months ago

I also did a test using starcoderbase-1b and can get 11.5 pass@1 which is very close to what I previously got using EvalPlus :).