bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.
Apache License 2.0
698 stars 180 forks source link

[FR] include "config" data in generations_only #226

Open Vipitis opened 2 months ago

Vipitis commented 2 months ago

feature proposal/request to include the "config" args when doing generations_only

Motivation

I run generations on several remote systems and then run the evaluations locally on my workstation. Currently I have to keep track of my exact parameters for the generations by the filename alone. And then run a really long command as well with all the same parameters for the evals.

pros

cons

relevant code

It seems to be the case that in generation_only mode, you only return save_generations. So the args are lost. https://github.com/bigcode-project/bigcode-evaluation-harness/blob/1b0147c50f406ff66ac4f806230479f31d19c7e6/main.py#L400-L408 the generations.json is just a list of list of strings. But it could easily hold the same config as the eval_results.json. You would also need a bit of code to read these args in eval_only mode. Also in the case of there being a crash during the eval run - you will have generations but no eval results saved.

Alternatively there could be a config.json file that keeps track of these when in generation_only (or an empty eval_results)... but that still leaves you with multiple files - instead of having all of it contained at once.


bit of an RFC, before I try and implement this in a PR myself. Especially on how to make it compatible with existing formats