KeyPhraseTransformer lets you quickly extract key phrases, topics, themes from your text data with T5 transformer | Keyphrase extraction | Keyword extraction
Hello there, I am trying to make it work using the "mt5" model type since I want to use it on an italian dataset. Unfortunately, all the documents are longer that the max length supported by the model so I thought I would specify truncation = True, max_length = 512 when calling the split_into_paragraphs() function at wc_temp = len(self.tokenizer.tokenize(temp, max_length=512, truncation=True)) but this is not working -- Token indices sequence length is longer than the specified maximum sequence length for this model (6508 > 512). Running this sequence through the model will result in indexing errors.
Have you already found the solution to this problem?
Hello there, I am trying to make it work using the "mt5" model type since I want to use it on an italian dataset. Unfortunately, all the documents are longer that the max length supported by the model so I thought I would specify
truncation = True, max_length = 512
when calling thesplit_into_paragraphs()
function atwc_temp = len(self.tokenizer.tokenize(temp, max_length=512, truncation=True))
but this is not working -- Token indices sequence length is longer than the specified maximum sequence length for this model (6508 > 512). Running this sequence through the model will result in indexing errors.Have you already found the solution to this problem?
Thank you in advance!