Adds a add_special_tokens property to the HuggingFaceAutoLM base class to allow users to specify whether or not model inputs should be encoded with special tokens.
Previously, special tokens were omitted from the tokenization of inputs/labels for Seq2Seq models, which most likely resulted in sub-optimal/unfair evaluations.
The defaults are now:
add_special_tokens=False for causal models and
add_special_tokens=True for seq2seq models
Updates seq2seq model unit tests to reflect this change.
add_special_tokens
property to theHuggingFaceAutoLM
base class to allow users to specify whether or not model inputs should be encoded with special tokens.add_special_tokens=False
for causal models andadd_special_tokens=True
for seq2seq models