dkpro / dkpro-core

Collection of software components for natural language processing (NLP) based on the Apache UIMA framework.
https://dkpro.github.io/dkpro-core
Other
196 stars 67 forks source link

Integrate MyStem (Russian stemmer) #1308

Closed Horsmann closed 5 years ago

Horsmann commented 5 years ago

Integration of a Russian stemmer https://tech.yandex.ru/mystem/ Closed-source, distributed as pre-compiled fat binaries that seem to include the model.

Non-profit/research use is permitted, commercial usage has constrained. Website is Russian only~

Horsmann commented 5 years ago

@reckart The stemmer is essentially completed but I will definitely not replace tabs with whitespaces. Can this checkstyle stuff be turned off?

reckart commented 5 years ago

The style guidelines we have since ages define spaces instead of tabs. Checkstyle just enforces that for a better experience. Tabs have the problem that different viewers use different tab widths (2, 4, 8, whatever) which makes the code look very differently in different viewers. Spaces do not have this problem. Please install the DKPro Core Style file for Eclipse or configure whatever IDE you are using correspondingly.

reckart commented 5 years ago

Mind that the style file does not take XML files into account - but they should also be formatted with spaces and use 2 space characters for indentation.

Horsmann commented 5 years ago

ok, thx. This tool is only available for 64bit. They dropped support for 32bit systems in the last iteration. I only added the 64bit binaries.