While using extractor.load_document() encountering this error:
ValueError: [E088] Text of length 1717453 exceeds maximum of 1000000. The parser and NER models require roughly 1GB of temporary memory per 100,000 characters in the input. This means long texts may cause memory allocation errors. If you're not using the parser or NER, it's probably safe to increase the nlp.max_length limit. The limit is in number of characters, so you can check whether your inputs are too long by checking len(text).
Referred the below issue link: #68
Code used:
def pke_topicrank(text):
# initialize keyphrase extraction model, here TopicRank
extractor = pke.unsupervised.TopicRank()
# load the content of the document, here document is expected to be a simple
# test string and preprocessing is carried out using spacy
#docs = list(nlp.pipe(text, batch_size=1000))
extractor.load_document(input=text, language="en", \
normalization=None)
# keyphrase candidate selection, in the case of TopicRank: sequences of nouns
# and adjectives (i.e. `(Noun|Adj)*`)
pos = {'NOUN', 'PROPN', 'ADJ'}
extractor.candidate_selection(pos=pos)
#extractor.candidate_selection()
#grammar selection
extractor.grammar_selection(grammar="NP: {<ADJ>*<NOUN|PROPN>+}")
# candidate weighting, in the case of TopicRank: using a random walk algorithm
extractor.candidate_weighting(threshold=0.74, method='average')
# N-best selection, keyphrases contains the 10 highest scored candidates as
# (keyphrase, score) tuples
keyphrases = extractor.get_n_best(n=10, redundancy_removal=True, stemming=True)
keyphrases = ', '.join(set([candidate for candidate, weight in keyphrases]))
return keyphrases
Solutions tried:
Increasing nlp.max_length to a higher value manually, while loading the spacy pre-trained model. I have installed spacy following the steps listed for GPU
While using extractor.load_document() encountering this error:
ValueError: [E088] Text of length 1717453 exceeds maximum of 1000000. The parser and NER models require roughly 1GB of temporary memory per 100,000 characters in the input. This means long texts may cause memory allocation errors. If you're not using the parser or NER, it's probably safe to increase the
nlp.max_length
limit. The limit is in number of characters, so you can check whether your inputs are too long by checkinglen(text)
.Referred the below issue link: #68
Code used:
Solutions tried:
resulting in this error
while adding sentencizer it returns no keywords