Open ththvseo opened 7 years ago
test program (originally by @nightlyone):
package main
import (
"flag"
"fmt"
"github.com/akhenakh/hunspellgo"
)
func main() {
lang := flag.String("d", "en_US", "language")
flag.Parse()
h := hunspellgo.Hunspell("/usr/share/hunspell/"+*lang+".aff", "/usr/share/hunspell/"+*lang+".dic")
fmt.Println(h.Stem(flag.Arg(0)))
}
Not sure why I have done that ... Here is the C header https://github.com/ropensci/hunspell/blob/master/src/hunspell/hunspell.h#L94
I believe this is the expected behaviour the stem only returns what it can stem: the hunspell cli:
house
*
houses
+ house
but it CAN stem "houses", just that it returns only a single result, which is then dropped by that code..
$ hunspell --version
@(#) International Ispell Version 3.2.06 (but really Hunspell 1.4.0)
$ hunspell -s
input> housing
housing housing
housing house
input> houses
houses house
input> house
house house
as for why it returns a copy of the input with some words ("housing") but not others ("house"), i have no idea. i looked at the C code of stem() already, but it's not obvious or documented what it does there.
Probably a bug then, comparing to https://github.com/nathanjsweet/gohun/blob/master/gohun.go#L118 seems to work as expected.
I'm inviting you to use another hunspell library since this one is not really maintained anymore ...
thanks for suggesting an alternative, will test that package after the holidays.
can anybody explain why Stem() doesn't use the result if it's length is 1?
https://github.com/akhenakh/hunspellgo/blame/master/hunspellgo.go#L84
this breaks some cases: (this is with en_US as included in debian)
with the
length==1
check removed: