RAKE short for Rapid Automatic Keyword Extraction algorithm, is a domain independent keyword extraction algorithm which tries to determine key phrases in a body of text by analyzing the frequency of word appearance and its co-occurance with other words in the text.
pip install rake-nltk
git clone https://github.com/csurfer/rake-nltk.git
python rake-nltk/setup.py install
from rake_nltk import Rake
# Uses stopwords for english from NLTK, and all puntuation characters by
# default
r = Rake()
# Extraction given the text.
r.extract_keywords_from_text(<text to process>)
# Extraction given the list of strings where each string is a sentence.
r.extract_keywords_from_sentences(<list of sentences>)
# To get keyword phrases ranked highest to lowest.
r.get_ranked_phrases()
# To get keyword phrases ranked highest to lowest with scores.
r.get_ranked_phrases_with_scores()
If you see a stopwords error, it means that you do not have the corpus
stopwords
downloaded from NLTK. You can download it using command below.
python -c "import nltk; nltk.download('stopwords')"
This is a python implementation of the algorithm as mentioned in paper Automatic keyword extraction from individual documents by Stuart Rose, Dave Engel, Nick Cramer and Wendy Cowley
Please use issue tracker for reporting bugs or feature requests.
poetry
using pip install poetry
.poetry install
to create project's virtual environment.poetry run tox
(Any python versions which you don't have checked out will fail this). Fix failing tests and repeat.pre-commit
using pip install pre-commit
and run pre-commit run --all-files
to do lint checks.poetry run sphinx-build -b html docs/ docs/_build/html
.requirements.txt
for automated testing using poetry export --dev --without-hashes -f requirements.txt > requirements.txt
.If you found the utility helpful you can buy me a cup of coffee using