Closed edoost closed 6 years ago
Hi @edoost,
For (1), you should set a new pattern for selecting the candidates. For (2), the nltk
stemmer is used by default and Persian is not currently supported. So there are two solutions: write a new input reader method on read_raw_document()
, or use CoreNLP input file in which the stems are placed as lemmas and use the use_lemmas=True
option in read_document()
.
Below is a snippet of code that summarize that:
import pke
extractor = pke.unsupervised.MultipartiteRank(input_file=input_file)
extractor.read_document(format='corenlp', use_lemmas=True)
extractor.grammar_selection(grammar="NP: {<NN.*>+<JJ.*>*}")
extractor.candidate_weighting()
keyphrases = extractor.get_n_best(n=5)
for u, v in keyphrases:
print(u, v)
f.
@boudinfl Thank you very much. It's working.
Hi.
I am using Multipartite Rank to extract keypharses from Persian documents and I have two questions:
In the paper, it is stated that the candidates with the pattern (/adj* noun+/) are selected. In Persian adjectives appear after nouns, how to make it work correctly in this case?
Topics are selected based on the stems of the words. How should I input the stems when I'm using the 'preprocessed' mode to read the documemts?
Thanks