dbmdz / berts

DBMDZ BERT, DistilBERT, ELECTRA, GPT-2 and ConvBERT models
MIT License
155 stars 12 forks source link

max sequence length #33

Open denizerden opened 3 years ago

denizerden commented 3 years ago

How to user bert turkish sentiment cased model for calculating sentiment scores of sentences with more than 512 sequence length?

stefan-it commented 3 years ago

Hi @denizerden ,

normally, longer sentences will be truncated then. You could use the BERTurk models with a larger vocab (128k):

With these models you should be able to use "longer" sentences, because the tokenizer uses less subtokens per token in theory (compared to the "normal" BERTurk models that have a 32k vocab).