csurfer / rake-nltk

Python implementation of the Rapid Automatic Keyword Extraction algorithm using NLTK.
https://csurfer.github.io/rake-nltk
MIT License
1.06k stars 150 forks source link

About Spanish #8

Closed chikiuso closed 6 years ago

chikiuso commented 7 years ago

Hi, may I ask about the workthrough to use Spanish rake? thanks.

AliceLee1203 commented 7 years ago

Have you found the solution yet? I met the same problem in Chinese.

AxelAli commented 7 years ago

You could download the spanish ntlk file. for example , i use r = Rake(language='portuguese') for portuguese!

csurfer commented 6 years ago

@chikiuso : @AxelAli is right. As long as nltk can tokenize the sentences in the language of your choice (which here is Spanish, which I do believe it does) you can create object using

r = Rake(language='spanish')

or you can provide your own stopwords and punctuation set to the object as

r = Rake(stopwords=<list of stopwords>, punctuations=<list of punctuations>)

and hence completely eliminate the need for providing the language. But do know that as the tokenizer still needs to know how to tokenize a language, if nltk doesn't support a particular language this package also wont.