bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.
Apache License 2.0
771 stars 201 forks source link

Show metric in outfile #24

Closed Muennighoff closed 1 year ago

Muennighoff commented 1 year ago
{
  "codexglue_code_to_text-python-left": 0.06565988797511521,
  "config": {
    "model": "bigcode/christmas-models"
  }
}

would be better to also have the metric imo

loubnabnl commented 1 year ago

That makes sense, how do you suggest including it? Add an arg to the task class or output the name of the metric after calling it in postprocess_results for example?

Muennighoff commented 1 year ago

Yeah I would return the metric name that we provide to evaluate.load, maybe sth like:

code_metric = load("code_eval")
results, _ = code_metric.compute(
    references=references,
    predictions=generations,
)
return {
    "code_eval": results,
}