csurfer / rake-nltk

Python implementation of the Rapid Automatic Keyword Extraction algorithm using NLTK.
https://csurfer.github.io/rake-nltk
MIT License
1.06k stars 150 forks source link

How to set RAKE parameters? #10

Closed vg-pulsepoint closed 7 years ago

vg-pulsepoint commented 7 years ago

How do we tune the Rake parameters, as in this example:

https://www.airpair.com/nlp/keyword-extraction-tutorial

So for example:

rake_object = rake.Rake("SmartStoplist.txt", 5, 3, 4)

Each word has at least 5 characters Each phrase has at most 3 words Each keyword appears in the text at least 4 times

These parameters change as a function of the corpus

csurfer commented 7 years ago

This is how this package works.

from rake_nltk import Rake
r = Rake() # Uses stopwords for english from NLTK, and all puntuation characters.
r = Rake(<language>) # To use it in a specific language supported by nltk.

# If you want to provide your own set of stop words and punctuations to
# r = Rake(<list of stopwords>, <string of puntuations to ignore>)
r.extract_keywords_from_text(<text to process>)
r.get_ranked_phrases() # To get keyword phrases ranked highest to lowest.

Here we do not have tuning of parameters yet. You can raise a feature request and I will try to get to it.