bigscience-workshop / lm-evaluation-harness

A framework for few-shot evaluation of autoregressive language models.
MIT License
101 stars 30 forks source link

Revert option for minimum generation length #118

Closed Muennighoff closed 2 years ago

Muennighoff commented 2 years ago

the previous implementation was slightly incorrect due to padding on input_ids I think Would have to e.g. sum across the attention mask to determine the shortest sequence. Instead it's cleaner to just set it to None imo.

You may not want to merge the mlsum changes though

jon-tow commented 2 years ago

@Muennighoff I'm wondering if we should revert the min-generation-tokens constraint for the following reasons:

  1. Introduces an "unnatural" confounder in the evaluation process of a model.
  2. Empirically, it seems as the though first token generation of an end-of-sequence token most exclusively occurs under the 0-shot setting.

cc @StellaAthena for opinions.

Muennighoff commented 2 years ago

Sure thanks 👍