bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.
Apache License 2.0
698 stars 180 forks source link

refactor(evalplus): maintain mbpp+ v0.2.0 #223

Closed ganler closed 2 months ago

ganler commented 2 months ago

This pull request maintains bigcode-evaluation-harness to use MBPP+ v0.2.0, further dropping several ill-formed tasks (e.g., the original test lists are wrong). The number of tasks changed from 399 to 378.

cc: @loubnabnl