Does it work for very long documents?

Hello there, I am trying to make it work using the "mt5" model type since I want to use it on an italian dataset. Unfortunately, all the documents are longer that the max length supported by the model so I thought I would specify truncation = True, max_length = 512 when calling the split_into_paragraphs() function at wc_temp = len(self.tokenizer.tokenize(temp, max_length=512, truncation=True)) but this is not working -- Token indices sequence length is longer than the specified maximum sequence length for this model (6508 > 512). Running this sequence through the model will result in indexing errors.

Have you already found the solution to this problem?

Thank you in advance!

Shivanandroy / KeyPhraseTransformer

Does it work for very long documents? #9