bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.
Apache License 2.0
702 stars 180 forks source link

QOL changes for generations #166

Closed maxmatical closed 6 months ago

maxmatical commented 7 months ago
  1. append task_name for multiple tasks
  2. fix n_tasks logic from dataset index out of bounds

Next 2 changes related to #58

  1. save intermediate generations with --save_every_k_samples
  2. resume generation from intermediate generations with --load_generations_intermediate_paths

Tested with HumanEval with saving every 50 samples + loading from intermediate generations:

len(intermediate_generations) = 150
should be generating 14 new samples for new_generations
curr_sample_idx = 150
number of problems for this task is 14
len(dataloader)= 14
len(code_gens) = 14

len(new_generations) = 14
len(generations) after concatenating = 164

Verified:

  1. (minor) add some typing + linting