This tokenizer was incorrectly instantiated with a model max length of 512

bjascob / amrlib

A python library that makes AMR parsing, generation and visualization simple.

MIT License

216 stars 33 forks source link

When updating from my system with transformers 4.19.4 (previously was 4.16.2) and Sentencepiece 0.1.96. I'm getting the following message... FutureWarning: This tokenizer was incorrectly instantiated with a model max length of 512 which will be corrected in Transformers v5. This only happens for the T5Tokenizer, not bart so it only impacts the generate_t5 and generate_t5wtense models.

To reproduce...

>>> from transformers import T5Tokenizer
>>> tokenizer = T5Tokenizer.from_pretrained('t5-base')

Note that this also happens with AutoTokenizer.from_pretrained('t5-base') (but not bart-base)

It looks like this is an issue with the transformers code, since T5 should have a max length of 512. Since I'm setting max_length=512 during tokenization it shouldn't be an issue anyway. Hopefully this message will go away with later updates to the transformers lib.

bjascob / amrlib

This tokenizer was incorrectly instantiated with a model max length of 512 #49