Closed pitskod closed 5 years ago
Hi @pitskod,
Sorry for the very late response. pke
does allow the compute TF for lemmatized words by setting the normalization
parameter to lemmatization
in the load_document()
method.
import pke
text = '''pke is an open source python-based keyphrase extraction toolkit.'''
extractor = pke.unsupervised.TopicRank()
extractor.load_document(input=text, language='en', normalization=None)
print(extractor.sentences[0].stems)
> ['pke', 'is', 'an', 'open', 'source', 'python', '-', 'based', 'keyphrase', 'extraction', 'toolkit', '.']
extractor.load_document(input=text, language='en', normalization='stemming')
print(extractor.sentences[0].stems)
> ['pke', 'is', 'an', 'open', 'sourc', 'python', '-', 'base', 'keyphras', 'extract', 'toolkit', '.']
extractor.load_document(input=text, language='en', normalization='lemmatization')
print(extractor.sentences[0].stems)
> ['pke', 'be', 'an', 'open', 'source', 'python', '-', 'base', 'keyphrase', 'extraction', 'toolkit', '.']
Please let me know if you encounter any issue with that.
f.
For tf-idf there is no way to have tf for lemmatize form of word (we can count tf for stemmed word or for word with no normalization). Maybe in load_file method in # word normalization section we need to add condition for lemmatization like : elif self.normalization is 'lemmatization': for i, sentence in enumerate(self.sentences): self.sentences[i].stems = sentence.stems