JuliaText / TextAnalysis.jl

Julia package for text analysis
Other
374 stars 96 forks source link

stemming issue for certain words e.g. providing -> provid #69

Closed tk3369 closed 11 months ago

tk3369 commented 6 years ago

Some words are not converted properly. Probably a libstemmer issue but that repo doesn't seem to be active so I'm posting here :-)

julia> sm = TextAnalysis.stemmer_for_document(StringDocument("hello"))
Stemmer algorithm:english encoding:UTF_8

julia> stem(sm, "coming")
"come"

julia> stem(sm, "coding")
"code"

julia> stem(sm, "providing")
"provid"

julia> stem(sm, "improvising")
"improvis"

julia> stem(sm, "pursuing")
"pursu"
aviks commented 6 years ago

Not sure what we can do about this. Everyone just seems to use the Snowball stemmer.

rssdev10 commented 11 months ago

https://snowballstem.org/ and a wrapper https://github.com/JuliaText/Snowball.jl