csurfer / rake-nltk

Python implementation of the Rapid Automatic Keyword Extraction algorithm using NLTK.
https://csurfer.github.io/rake-nltk
MIT License
1.06k stars 150 forks source link

Minimum co-occurrence for adjacent words #18

Open atulpuri opened 6 years ago

atulpuri commented 6 years ago

I really liked this implementation of the algorithm, however, I noticed one discrepancy.

The paper mentions (Section 1.2.3 - Adjoining Keywords) that adjoining keywords must occur at least twice in the same order for them to considered to be as the same phrase. The current implementation doesn't account for this though!

https://www.researchgate.net/publication/227988510_Automatic_Keyword_Extraction_from_Individual_Documents