bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.
Apache License 2.0
744 stars 193 forks source link

Update MultiPL-E prompts #96

Closed arjunguha closed 1 year ago

arjunguha commented 1 year ago

This patch updates the version of MultiPL-E used in the evaluation harness to the version that was used in the StarCoder paper. The previous version were the prompts used for the SantaCoder paper.

For the StarCoder paper, we fixed a number of bugs that were artificially lowering the scores for a couple of languages -- almost all typed languages. Most notable is that Java's performance will go up about 2-3% on StarCoderBase with this update.