LIAAD / yake

Single-document unsupervised keyword extraction
https://liaad.github.io/yake
Other
1.65k stars 230 forks source link

Unsupported YAKE languages #58

Closed Artgit closed 2 years ago

Artgit commented 2 years ago

Looks like the following languages are unsupported by YAKE:

TAGALOG
VIETNAMESE
BENGALI
BOKMAL
YORUBA
CZECH
SOTHO
URDU
PUNJABI
SWAHILI
ALBANIAN
BELARUSIAN
MACEDONIAN
AZERBAIJANI
AFRIKAANS
XHOSA
ICELANDIC
TAMIL
KAZAKH
MONGOLIAN
CATALAN
GEORGIAN
LATIN
MAORI
MALAY
NYNORSK
GUJARATI
TSWANA
BOSNIAN
ZULU
TELUGU
ESPERANTO
SERBIAN
SOMALI
TSONGA
GANDA
BASQUE
HEBREW
WELSH
THAI
IRISH
SHONA
KOREAN
MARATHI

It there any particular reason why they are unsupported?

arianpasquali commented 2 years ago

No particular reason. They were just not tested.

In theory the only language resource you need is a list of stopwords. If you have access to a stopword list for that language you can just specify it using the stopwords argument, in this case the language argument is just ignored.

If you have access to annotated dataset for any of these languages and want to contribute, please take a look at the evaluation datasets repository that we maintain here https://github.com/LIAAD/KeywordExtractor-Datasets.