Open JohnGiorgi opened 4 years ago
Hi,
I noticed something weird about the max_len attribute of the tokenizer
max_len
tokenizer
from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("allenai/scibert_scivocab_uncased") print(tokenizer.max_len) # => 1000000000000000019884624838656
Whereas I expected it to be 512, as in
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased") print(tokenizer.max_len) # => 512
is this a bug? Or is max_len not the appropriate attribute to use if I want to know the max length for the inputs of the model?
Hi,
I noticed something weird about the
max_len
attribute of thetokenizer
Whereas I expected it to be 512, as in
is this a bug? Or is
max_len
not the appropriate attribute to use if I want to know the max length for the inputs of the model?