bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.
Apache License 2.0
702 stars 180 forks source link

SantaCoder FIM task #164

Closed maxmatical closed 7 months ago

maxmatical commented 7 months ago

PR for #69

Results from eval

{
  "santacoder_fim": {
    "java Exact Match": 0.6123595505617978,
     "js Exact Match": 0.5972574911122397,
    "py Exact Match": 0.4573346116970278,
    },
}

Compare w/ table 7 SantaCoder paper: Java: 0.62 JS: 0.6 Py: 0.44

also added 1 small refactor to make reusing task._stop_at_stop_token easier

loubnabnl commented 7 months ago

Thanks! I think we're good to merge (If you have time later it would be nice to add CodeLlama task for people that want to evaluate CodeLlama family of models with FIM)