JuliaText / TextAnalysis.jl

Julia package for text analysis
Other
374 stars 96 forks source link

Fix language specificaion in stemmer type #116

Closed nickto closed 5 years ago

nickto commented 5 years ago

Currently the language for the stemmer is inferred using name(language(d)) where d is an ::AsbtractDocument. This produces the name of the language in that language (e.g., "русский" for Russian). Snowball stemmer, however, requires it be in English or as an ISO code:

[...] The algorithm may be selected using the english name of the language, or using the 2 or 3 letter ISO 639 language codes. [...]

This PR fixes it by using english_name instead of name, thus producing, e.g., "russian" instead of "русский".

(Tested only on Russian)