EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.
https://www.eleuther.ai
MIT License
6.9k stars 1.84k forks source link

using "A:" replace "A: " #1043

Closed milliemaoo closed 11 months ago

milliemaoo commented 11 months ago

When I worked with the T5 series model (non-instruction tuned) on the Bigbench (e.g., sports_understanding), I noticed the generation is sensitive to space character in the output_prefix.

For instance, when using the input 'Determine whether an artificially constructed sentence relating to sports is plausible or not.\n\nQ: Is the following sentence plausible? "Bam Adebayo scored a reverse layup in the Western Conference Finals."\nA: yes (...omit) Q: Is the following sentence plausible? "Emmanuel Sanders got a base hit."\nA:', the model generates the correct response. However, altering the output prefix from "A:" to "A: " results in incomplete outputs like "ye".

I’m not sure why this occurs, but I wanted to share this observation. Feel free to correct me if I miss anything.

StellaAthena commented 11 months ago

You're not missing anything at all :) This is a very common and unfortunate problem with evaluating models, especially ones that aren't instruction-finetuned.