bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.
Apache License 2.0
825 stars 219 forks source link

Add mbpp+ evaluation task #190

Closed ganler closed 9 months ago

ganler commented 10 months ago

Similar to #187, we have converted MBPP+ into a form that is similar to the original MBPP format and is made compatible with bigcode-evaluation-harness.

MBPP+ dataset is also available on the hub: https://huggingface.co/datasets/evalplus/mbppplus

Tested on deepseekcoder-1.3b-base and can repro the scores in the EvalPlus leaderboard:

image

cc: @loubnabnl