feature proposal/request to include the "config" args when doing generations_only
Motivation
I run generations on several remote systems and then run the evaluations locally on my workstation. Currently I have to keep track of my exact parameters for the generations by the filename alone. And then run a really long command as well with all the same parameters for the evals.
pros
never mix up runs
don't use filenames for data
could be much easier to run eval only (just have to give it a .json that already includes the task, limit, model, etc)
cons
likely breaks some existing downstream scripts
tiny bit of redundant data
relevant code
It seems to be the case that in generation_only mode, you only return save_generations. So the args are lost.
https://github.com/bigcode-project/bigcode-evaluation-harness/blob/1b0147c50f406ff66ac4f806230479f31d19c7e6/main.py#L400-L408
the generations.json is just a list of list of strings. But it could easily hold the same config as the eval_results.json.
You would also need a bit of code to read these args in eval_only mode.
Also in the case of there being a crash during the eval run - you will have generations but no eval results saved.
Alternatively there could be a config.json file that keeps track of these when in generation_only (or an empty eval_results)... but that still leaves you with multiple files - instead of having all of it contained at once.
bit of an RFC, before I try and implement this in a PR myself. Especially on how to make it compatible with existing formats
feature proposal/request to include the "config" args when doing generations_only
Motivation
I run generations on several remote systems and then run the evaluations locally on my workstation. Currently I have to keep track of my exact parameters for the generations by the filename alone. And then run a really long command as well with all the same parameters for the evals.
pros
cons
relevant code
It seems to be the case that in generation_only mode, you only return save_generations. So the args are lost. https://github.com/bigcode-project/bigcode-evaluation-harness/blob/1b0147c50f406ff66ac4f806230479f31d19c7e6/main.py#L400-L408 the
generations.json
is just a list of list of strings. But it could easily hold the same config as theeval_results.json
. You would also need a bit of code to read these args in eval_only mode. Also in the case of there being a crash during the eval run - you will have generations but no eval results saved.Alternatively there could be a config.json file that keeps track of these when in generation_only (or an empty eval_results)... but that still leaves you with multiple files - instead of having all of it contained at once.
bit of an RFC, before I try and implement this in a PR myself. Especially on how to make it compatible with existing formats