QOL changes for generations - Githubissues

bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.

Apache License 2.0

702 stars 180 forks source link

QOL changes for generations #166

Closed maxmatical closed 6 months ago

maxmatical commented 7 months ago

append task_name for multiple tasks
fix n_tasks logic from dataset index out of bounds

Next 2 changes related to #58

save intermediate generations with --save_every_k_samples
resume generation from intermediate generations with --load_generations_intermediate_paths

Tested with HumanEval with saving every 50 samples + loading from intermediate generations:

len(intermediate_generations) = 150
should be generating 14 new samples for new_generations
curr_sample_idx = 150
number of problems for this task is 14
len(dataloader)= 14
len(code_gens) = 14

len(new_generations) = 14
len(generations) after concatenating = 164

Verified:

loading form intermediate generations generates same output as with --limit_start 150 on HumanEval
Saved generations match final generations
Eval metrics unchanged

(minor) add some typing + linting