LIAAD / yake

Single-document unsupervised keyword extraction
https://liaad.github.io/yake
Other
1.65k stars 229 forks source link

Will Yake support asian languages such Korean or Japanese? #41

Closed sysmetic closed 2 years ago

sysmetic commented 3 years ago

I was really inspired by YAKE framework with tremendous usefulness.

arianpasquali commented 3 years ago

Hi @sysmetic In theory yes, but we can not promise the same performance for logographic languages as in phologic ones.

However the only thing you need to do is provide the stopwords list for that language.

Check the answer for a similar question here https://github.com/LIAAD/yake/issues/40

sysmetic commented 3 years ago

Hi @sysmetic In theory yes, but we can not promise the same performance for logographic langauges as in phologic ones.

However the only thing you need to do is provide the stopwords list for that language.

Check the answer for a similar question here #40

As far as I've known, Korean doesn't have GENERAL stopwords list because it belongs in "agglutinative" languages as well as Japanese. So, to get stop words generally is to use several morph analyzers(https://konlpy.org/en/v0.4.3/morph/) and its results of analyzers are need to filter stop word by the information of pos tagging, which are seems to be stopwords in contexts. If I do available, I can provide you pos-tagger list of Korean aka stopwords

Thank you.

arianpasquali commented 3 years ago

Hi @sysmetic. Let me know if you managed to make it work for Korean.