When I worked with the T5 series model (non-instruction tuned) on the Bigbench (e.g., sports_understanding), I noticed the generation is sensitive to space character in the output_prefix.
For instance, when using the input 'Determine whether an artificially constructed sentence relating to sports is plausible or not.\n\nQ: Is the following sentence plausible? "Bam Adebayo scored a reverse layup in the Western Conference Finals."\nA: yes (...omit) Q: Is the following sentence plausible? "Emmanuel Sanders got a base hit."\nA:', the model generates the correct response. However, altering the output prefix from "A:" to "A: " results in incomplete outputs like "ye".
I’m not sure why this occurs, but I wanted to share this observation. Feel free to correct me if I miss anything.
You're not missing anything at all :) This is a very common and unfortunate problem with evaluating models, especially ones that aren't instruction-finetuned.
When I worked with the T5 series model (non-instruction tuned) on the Bigbench (e.g., sports_understanding), I noticed the generation is sensitive to space character in the output_prefix.
For instance, when using the input 'Determine whether an artificially constructed sentence relating to sports is plausible or not.\n\nQ: Is the following sentence plausible? "Bam Adebayo scored a reverse layup in the Western Conference Finals."\nA: yes (...omit) Q: Is the following sentence plausible? "Emmanuel Sanders got a base hit."\nA:', the model generates the correct response. However, altering the output prefix from "A:" to "A: " results in incomplete outputs like "ye".
I’m not sure why this occurs, but I wanted to share this observation. Feel free to correct me if I miss anything.