Tokenizer's `model_max_length` is not consistent

Most tokenizers define their max model length as either 510 tokens or more and is based on:

Model max token lenght - number of tokens needed to define a sentence (start and end)

Example

Most tokenizers follow this convention, but there are some that have nearly infinite length, with tokenizer.model_max_length=1000000000000000019884624838656

https://huggingface.co/KB/bert-base-swedish-cased

This means that when converting the tokenizer max length, in Tensorflow, most values are assumed to be ints, but with nearly infinit model length, it needs to be a tf.long or greater for the conversion not to fail

Initially, the tokenizers model_max_length was set dynamically, but is now set to 510 tokens. This should be changed to reflect the actual tokenizers.

Hugging-Face-Supporter / tftokenizers

Tokenizer's `model_max_length` is not consistent #3

Example