bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.
Apache License 2.0
744 stars 193 forks source link

Add SantaCoder FIM task #69

Closed loubnabnl closed 9 months ago

loubnabnl commented 1 year ago

Add support for this FIM task discussed in this issue on HumanEval and make sure numbers match with MultiPL-E implementation for santacoder for example (see table 7 of this paper) The evaluation-harness already supports FIM mode for santacoder and incoder which is used by DS-1000 task for insertion mode

arjunguha commented 9 months ago

There is standalone code to do this here:

https://github.com/arjunguha/santacoder_fim_benchmark

It would be great if someone tried to integrate it into the eval harness. :)

maxmatical commented 9 months ago

@arjunguha looking to implement this. do you have the generation args for SantaCoder used to get the results in table 7? is it the same as

GENERATION_ARGS = {
    "do_sample": True,
    "temperature": 0.2,
    "top_p": 0.95,
    "max_new_tokens": 25,
}

Thanks 😄

arjunguha commented 9 months ago

yes same ones in the file, which is that