Closed snomos closed 1 year ago
echo gouvdageainnus | hfst-ospell -l 'acceptor.default.hfst' -m 'errmodel.default.hfst' -S -n 1
"gouvdageainnus" is NOT in the lexicon:
Corrections for "gouvdageainnus":
bruvdageainnus 47.296875
echo Gouvdageainnus | hfst-ospell -l 'acceptor.default.hfst' -m 'errmodel.default.hfst' -S -n 1
"Gouvdageainnus" is NOT in the lexicon:
Corrections for "Gouvdageainnus":
Guovdageainnus -0.703125
so hfst-ospell gives the expected results, though with differing weights
printf '"<gouvdageainnus>"\n\t"guovdageainnus" ?\n' | divvun-cgspell -n 1 -b 15 -w 5000 -l acceptor.default.hfst -m errmodel.default.hfst
"<gouvdageainnus>"
"guovdageainnus" ?
"geaidnu" N Sem/Route Sg Loc <W:47.2969> <WA:27.2969> <spelled> "<bruvdageainnus>"
"bruvda" N Sem/Dummytag Cmp/SgNom Cmp
"geainnus" N Sem/Route Sg Nom <W:47.2969> <WA:27.2969> <spelled> "<bruvdageainnus>"
"bruvda" N Sem/Dummytag Cmp/SgNom Cmp
"geaidnu" N Sem/Route Sg Gen PxSg3 <W:47.2969> <WA:30.2969> <spelled> "<bruvdageainnus>"
"bruvda" N Sem/Dummytag Cmp/SgNom Cmp
"geaidnu" N Sem/Route Sg Acc PxSg3 <W:47.2969> <WA:30.2969> <spelled> "<bruvdageainnus>"
"bruvda" N Sem/Dummytag Cmp/SgNom Cmp
and lowercased, cgspell is giving the suggestion
printf '"<Gouvdageainnus>"\n\t"Guovdageainnus" ?\n' | divvun-cgspell -n 1 -b 15 -w 5000 -l acceptor.default.hfst -m errmodel.default.hfst
"<Gouvdageainnus>"
"Guovdageainnus" ?
the invocation from smegram.mode gives nothing, but it has this -b 15 there that I don't know how exactly works; if we change that
printf '"<Gouvdageainnus>"\n\t"Guovdageainnus" ?\n' | divvun-cgspell -n 1 -b 20 -w 5000 -l acceptor.default.hfst -m errmodel.default.hfst
"<Gouvdageainnus>"
"Guovdageainnus" ?
"Guovdageaidnu" N Prop Sem/Plc Sg Loc <W:-0.703125> <WA:17.2969> <spelled> "<Guovdageainnus>"
it seems to give the expected result – does it run much slower?
The -b
option (short for beam ) sets a limit on the max weight difference between the best and the worst suggestions, 15 with the original setting. What I don't get is that the weight -0.703125
from both hfst-ospell
and divvun-cgspell
is by far much lower than anything, and the distance to the next suggestions is much more than 15.
Could it be that there is a bug with the mathematics somewhere, such that negative weights are not properly handled?
Setting -b = 20
should be no problem though.
So not beam as in https://en.wikipedia.org/wiki/Beam_search ? (That would actually explain it)
Not as I have understood it. But the whole beam search option was something added by S Hardwick, you better ask him for the technical details :)
This is still a problem. Here are some more examples:
echo Servodaas | ./modes/trace-smegramrelease.mode
"<Servodaas>"
"Servodaas" N Prop Sem/Plc Sg Loc Guess <LastCohort> <firstCohort> @HNOUN SUBSTITUTE:3417 MAP:23080:hnounAdvl
:\n
echo servodaas | ./modes/trace-smegramrelease.mode
"<servodaas>"
"servodaas" ? <LastCohort> <firstCohort> &typo ADD:10126:uncorrected-typos
typo
:\n
Compare with two different spellers, with both initial upper and lower case:
echo Servodaas | divvunspell suggest -a tools/spellcheckers/se-desktop.zhfst
Reading from stdin...
Input: Servodaas [INCORRECT]
Servodagas 48.59303
Servvodagas 66.203186
Servotbas 78.3018
Serrodagas 80.3018
Servodatbas 80.3018
Servošabas 80.3018
Servodaga 83.17057
Servodat 84.31137
Servodagat 85.399826
Servobas 92.3018
echo servodaas | divvunspell suggest -a tools/spellcheckers/se-desktop.zhfst
Reading from stdin...
Input: servodaas [INCORRECT]
servodagas 33.59303
servvodagas 51.203186
servotbas 63.3018
serrodagas 65.3018
servodatbas 65.3018
servošabas 65.3018
servodaga 68.17057
servodat 69.31137
servodagat 70.399826
servobas 77.3018
echo '5 Servodaas' | hfst-ospell-office tools/spellcheckers/se-desktop.zhfst
@@ hfst-ospell-office is alive
& Servvodagas Servodagas Servodaga Servodat Servodagat
echo '5 servodaas' | hfst-ospell-office tools/spellcheckers/se-desktop.zhfst
@@ hfst-ospell-office is alive
& servodagas servvodagas servodaga servodat servodagat
That is, the spellers have no problems giving reasonable suggestions, but nothing pops up in the grammar checker.
I can't reproduce – was this fixed in a different issue?
$ echo Servodaas | ./modes/trace-smegramrelease.mode
"<Servodaas>"
"servodat" v1 N Sem/Org Sg Loc <W:48.2094> <WA:8.20939> <spelled> "servodagas"S PROTECT:3480 SELECT:3715 &SUGGESTWF &typo ADD:10118:spelled
typo
; "servodat" v1 N Sem/Org Sg Gen PxSg3 <W:48.2094> <WA:21.2094> <spelled> "servodagas"S PROTECT:3480 SELECT:3715 REMOVE:1296
; "servodat" v1 N Sem/Org Sg Acc PxSg3 <W:48.2094> <WA:21.2094> <spelled> "servodagas"S PROTECT:3480 SELECT:3715 REMOVE:1296
; "Servodaas" N Prop Sem/Plc Sg Loc Guess <LastCohort> <firstCohort> SUBSTITUTE:3423 SELECT:3715
:\n
$ echo Servodaas|divvun-checker -l se |jq .
{
"errs": [
[
"Servodaas",
0,
9,
"typo",
"Ii leat sátnelisttus",
[
"Servodagas"
],
"Čállinmeattáhus"
]
],
"text": "Servodaas"
}
$ echo mas Gouvdageainnus eai beasa|divvun-checker -l se |jq .
{
"errs": [
[
"Gouvdageainnus",
4,
18,
"typo",
"Ii leat sátnelisttus",
[
"Guovdageainnus",
"Govdageainnus",
"Ovdageainnus",
"Ruvdageainnus",
"Hoavdageainnus",
"Bruvdageainnus",
"Buvdageainnus",
"Soundageainnus"
],
"Čállinmeattáhus"
]
],
"text": "mas Gouvdageainnus eai beasa"
}
This seems to be fixed, I get the same results as you:
echo Servodaas | divvun-checker -a se.zcheck | jq .
{
"errs": [
[
"Servodaas",
0,
9,
"typo",
"Ii leat sátnelisttus",
[
"Servodagas"
],
"Čállinmeattáhus"
]
],
"text": "Servodaas"
}
And:
echo mas Gouvdageainnus eai beasa | divvun-checker -a se.zcheck | jq .
{
"errs": [
[
"Gouvdageainnus",
4,
18,
"typo",
"Ii leat sátnelisttus",
[
"Guovdageainnus",
"Bruvdageainnus",
"Soundageainnus",
"Buvdageainnus",
"Ruvdageainnus",
"Govdageainnus",
"Hoavdageainnus",
"Ovdageainnus"
],
"Čállinmeattáhus"
]
],
"text": "mas Gouvdageainnus eai beasa"
}
It looks like the grammar checker skips the speller if the input word starts with a capital:
This seems a bit too restricted. Could that be changed?