csurfer / rake-nltk

Python implementation of the Rapid Automatic Keyword Extraction algorithm using NLTK.
https://csurfer.github.io/rake-nltk
MIT License
1.06k stars 150 forks source link

fix: Set default tokenizer language to actual language parameter. #71

Open Peaverin opened 2 years ago

Peaverin commented 2 years ago

Currently, the default tokenizer (nltk.tokenize.sent_tokenize) is not using the language set in the constructor, but the default language as set in nltk.tokenize.sent_tokenize method, which is english (see https://www.nltk.org/api/nltk.tokenize.html#nltk.tokenize.sent_tokenize). This is a simple fix that sets the default tokenizer as the nlkt tokenize function but changing the default language parameter to the one set by the user in Rake constructor.

Note that the word tokenizer does not need the change as nltk.tokenize.wordpunct_tokenize is language agnostic.