divvun / libdivvun

lib for running gramcheck and other pipelines + cli; modules for CG→spelling, CG→feedback, tagging blanks
https://giellalt.github.io/proof/gramcheck/GrammarCheckerDocumentation.html
GNU General Public License v3.0
9 stars 1 forks source link

divvun-suggest -j mangles speller suggestions #2

Closed snomos closed 6 years ago

snomos commented 6 years ago

Given the following command:

$ echo "Muhto go beaivi jávkkai vári duohkai ja giđđaija ilbmi luoitáldii gilli badjel, vázzái son viidáseappot davasguvlui vuovddis." | hfst-tokenise --giella-cg tokeniser-gramcheck-gt-desc.pmhfst | vislcg3 -g mwe-dis.bin | cg-mwesplit | divvun-cgspell -a se.zhfst | vislcg3 -g disambiguator.bin | vislcg3 -g grammarchecker.bin | divvun-suggest -g generator-gt-norm.hfstol -m errors.xml -j

the output is the following (with added linewraps for readability):

{"errs":
  [
   ["ja",37,39,"default","default",[", ja"]],
   ["luoitáldii",55,65,"typo","typo",["luoitádit"]],
   ["gilli",66,71,"msyn-gen-before-postp","Iskka genitiivva alege nominatiivva",["gili"]],
   ["davasguvlui",104,115,"typo","typo",["","davveguvlui","davviguvlui","divaguvlui","divatguvlui","lagasguvlui"]]
  ],
  "text":
  "Muhto go beaivi jávkkai vári duohkai ja giđđaija ilbmi luoitáldii gilli badjel, vázzái son viidáseappot davasguvlui vuovddis.\n"
}

Compare the suggestion list for the spelling error davasguvlui with the output from the following command:

$ echo "Muhto go beaivi jávkkai vári duohkai ja giđđaija ilbmi luoitáldii gilli badjel, vázzái son viidáseappot davasguvlui vuovddis." | hfst-tokenise --giella-cg tokeniser-gramcheck-gt-desc.pmhfst | vislcg3 -g mwe-dis.bin | cg-mwesplit | divvun-cgspell -a se.zhfst | vislcg3 -g disambiguator.bin | vislcg3 -g grammarchecker.bin | divvun-suggest -g generator-gt-norm.hfstol -m errors.xml

Output:

...
"<viidáseappot>"
    "viiddis" A Comp Attr Err/Orth <W:0.0000000000> @>N #17->17
    "viidáseappot" v1 Adv Comp <W:0.0000000000> @<ADVL #17->17
: 
"<davasguvlui>"
    "davasguvlui" ? #18->18
    "davás guvlui" Adv <W:9.30176> <WA:15.3018> <spelled> "<davás guvlui>" @<ADVL &SUGGESTWF &typo #18->18
typo
    "divatguovlu" N Sg Ill <W:35.3018> <WA:15.3018> <spelled> "<divatguvlui>" @<ADVL &SUGGESTWF &typo #18->18
typo
    "davviguovlu" N Sg Ill <W:35.3018> <WA:15.3018> <spelled> "<davviguvlui>" @<ADVL &SUGGESTWF &typo #18->18
typo
    "davveguovlu" N Sg Ill <W:35.3018> <WA:15.3018> <spelled> "<davveguvlui>" @<ADVL &SUGGESTWF &typo #18->18
typo
    "divaguovlu" N Sg Ill <W:35.3018> <WA:15.3018> <spelled> "<divaguvlui>" @<ADVL &SUGGESTWF &typo #18->18
typo
    "lagasguovlu" N Sg Ill <W:35.3018> <WA:15.3018> <spelled> "<lagasguvlui>" @<ADVL &SUGGESTWF &typo #18->18
typo
: 
"<vuovddis>"
    "vuovdi" N Sem/Plc Sg Loc <W:0.0000000000> @<ADVL #19->19
...

It seems that the json formatting first deletes the string content of suggestions containing spaces, and then sorts the suggestion list alphabetically. Both are of course unwanted :-)

This behavior is also also seen in the divvun-checker program.