csurfer / rake-nltk

Python implementation of the Rapid Automatic Keyword Extraction algorithm using NLTK.
https://csurfer.github.io/rake-nltk
MIT License
1.06k stars 150 forks source link

Language selection #17

Closed digvijayv closed 6 years ago

digvijayv commented 6 years ago

When i am selecting some other language in rake then it is not identifying keyphrase rather its throwing the whole sentence as an key phrase.

birm commented 6 years ago

Are you able to post a snippet/gist with the code you're using and the output?

juggernauts commented 6 years ago

from rake_nltk import Rake r = Rake("german") text = "Weitere Luftschläge gegen syrische Stellungen würden \"unweigerlich Chaos\" verursachen, droht Russlands Präsident Putin. Frankreich will den Konflikt jetzt mit einer neuen Uno-Resolution entschärfen." r.extract_keywords_from_text(text) print r.get_ranked_phrases()

The above code produces the following output: ['frankreich will den konflikt jetzt mit einer neuen uno', 'weitere luftschlge gegen syrische stellungen wrden', 'droht russlands prsident putin', 'unweigerlich chaos', 'resolution entschrfen', 'verursachen']

It seems like for languages other than English, it is not removing stopwords at all!

csurfer commented 6 years ago

@juggernauts : When you use the API as r = Rake("german") you are not actually using the API with language set to german. You need to use it as r = Rake(language="german") to get that set. i,e Language is not the first parameter to the API. Please use as above and it will work as advertised.