Closed vg-pulsepoint closed 7 years ago
This is how this package works.
from rake_nltk import Rake
r = Rake() # Uses stopwords for english from NLTK, and all puntuation characters.
r = Rake(<language>) # To use it in a specific language supported by nltk.
# If you want to provide your own set of stop words and punctuations to
# r = Rake(<list of stopwords>, <string of puntuations to ignore>)
r.extract_keywords_from_text(<text to process>)
r.get_ranked_phrases() # To get keyword phrases ranked highest to lowest.
Here we do not have tuning of parameters yet. You can raise a feature request and I will try to get to it.
How do we tune the Rake parameters, as in this example:
https://www.airpair.com/nlp/keyword-extraction-tutorial
So for example:
rake_object = rake.Rake("SmartStoplist.txt", 5, 3, 4)
Each word has at least 5 characters Each phrase has at most 3 words Each keyword appears in the text at least 4 times
These parameters change as a function of the corpus