akhenakh / hunspellgo

Hunspell bidings for Golang
8 stars 10 forks source link

why does Stem() not use the result if length==1 ? #2

Open ththvseo opened 7 years ago

ththvseo commented 7 years ago

can anybody explain why Stem() doesn't use the result if it's length is 1?

https://github.com/akhenakh/hunspellgo/blame/master/hunspellgo.go#L84

func (handle *Hunhandle) Stem(word string) []string {
    length = C.Hunspell_stem(handle.handle, &carray, wordcs)
    if int(length) == 1 {
        return []string{}

this breaks some cases: (this is with en_US as included in debian)

$ go run example.go housing
[housing house]
$ go run test.go houses
[]

with the length==1 check removed:

$ go run test.go houses
[house]
ththvseo commented 7 years ago

test program (originally by @nightlyone):

package main

import (
        "flag"
        "fmt"

        "github.com/akhenakh/hunspellgo"
)

func main() {
        lang := flag.String("d", "en_US", "language")
        flag.Parse()
        h := hunspellgo.Hunspell("/usr/share/hunspell/"+*lang+".aff", "/usr/share/hunspell/"+*lang+".dic")
        fmt.Println(h.Stem(flag.Arg(0)))
}
akhenakh commented 7 years ago

Not sure why I have done that ... Here is the C header https://github.com/ropensci/hunspell/blob/master/src/hunspell/hunspell.h#L94

akhenakh commented 7 years ago

I believe this is the expected behaviour the stem only returns what it can stem: the hunspell cli:

house
*

houses
+ house
ththvseo commented 7 years ago

but it CAN stem "houses", just that it returns only a single result, which is then dropped by that code..

$ hunspell --version
@(#) International Ispell Version 3.2.06 (but really Hunspell 1.4.0)
$ hunspell -s
input> housing
housing housing
housing house

input> houses
houses house

input> house
house house
ththvseo commented 7 years ago

as for why it returns a copy of the input with some words ("housing") but not others ("house"), i have no idea. i looked at the C code of stem() already, but it's not obvious or documented what it does there.

akhenakh commented 7 years ago

Probably a bug then, comparing to https://github.com/nathanjsweet/gohun/blob/master/gohun.go#L118 seems to work as expected.

I'm inviting you to use another hunspell library since this one is not really maintained anymore ...

ththvseo commented 7 years ago

thanks for suggesting an alternative, will test that package after the holidays.