csurfer / rake-nltk

Python implementation of the Rapid Automatic Keyword Extraction algorithm using NLTK.
https://csurfer.github.io/rake-nltk
MIT License
1.06k stars 150 forks source link

Don't think frequency distribution is working #28

Open odedniv opened 6 years ago

odedniv commented 6 years ago

I'm not an expert in NLTK, but I tried following the algorithm and I don't understand how it can work.

It seems _build_frequency_dist is supposed to count frequency of phrases. However, the phrase_list it receives is the one generated by _generate_phrases which returns a set(), which means every phrase can only appear there once.

The generated Counter object counts every phrase as appearing once.

This doesn't make sense no?