Closed loubnabnl closed 12 months ago
There is standalone code to do this here:
https://github.com/arjunguha/santacoder_fim_benchmark
It would be great if someone tried to integrate it into the eval harness. :)
@arjunguha looking to implement this. do you have the generation args for SantaCoder used to get the results in table 7? is it the same as
GENERATION_ARGS = {
"do_sample": True,
"temperature": 0.2,
"top_p": 0.95,
"max_new_tokens": 25,
}
Thanks 😄
yes same ones in the file, which is that
Add support for this FIM task discussed in this issue on HumanEval and make sure numbers match with MultiPL-E implementation for santacoder for example (see table 7 of this paper) The evaluation-harness already supports FIM mode for santacoder and incoder which is used by DS-1000 task for insertion mode