Closed albbas closed 2 months ago
Date: 2020-12-18 11:43:49 +0100
From: Linda Wiechetek <
Here we have a compound error (Álgoálbmot nissonat) should be written as one word, but the correction does not produce a capitalized first letter. What can we do?
Here is the sentence:
Álgoálbmot nissonat vásihit dávjá máŋgga dimenšuvnnat vealaheami, sihke sohkabeali ektui ja čearddalašvuođa ektui.
"<Álgoálbmot nissonat>"
"álgoálbmotnisu" v1 N Sem/Hum Pl Nom Err/SpaceCmp
Date: 2021-02-16 14:34:51 +0100
From: Tommi A Pirinen <
I tried editing divvun-suggest but not sure if correct place so waiting for Kevin's feedback i github:
echo Álgoálbmot nissonat vásihit dávjá máŋgga dimenšuvnnat vealaheami, sihke sohkabeali ektui ja čearddalašvuođa ektui. | hfst-tokenise -g tokeniser-gramcheck-gt-desc.pmhfst | divvun-blanktag analyser-gt-whitespace.hfst | vislcg3 --trace -g valency.bin | vislcg3 --trace -g mwe-dis.bin | cg-mwesplit | divvun-blanktag analyser-gt-errorwhitespace.hfst | divvun-cgspell -n 10 -b 15 -w 5000 -u 0.4 -l acceptor.default.hfst -m errmodel.default.hfst | vislcg3 --trace -g grc-disambiguator.bin | vislcg3 --trace -g spellchecker.bin | vislcg3 --trace -g after-speller-disambiguator.bin | vislcg3 --trace -g grammarchecker.cg3 | ~/github/divvun/libdivvun/src/divvun-suggest -g generator-gramcheck-gt-norm.hfstol -m errors.xml | head -n 75
"<Álgoálbmot nissonat>"
"álgoálbmotnisu" v1 N Sem/Hum Pl Nom Err/SpaceCmp
it's possible it breaks everything else...
Date: 2021-02-16 18:06:01 +0100
From: Linda Wiechetek <
Great! And oh no.. I would like to have Sjur's comment's on this as well.
Date: 2021-02-16 21:23:40 +0100 From: Kevin Brubeck Unhammer <<unhammer+apertium>>
Casing was already applied in the json and library modes (used by web, LO):
$ echo 'Álgoálbmot nissonat vásihit dávjá máŋgga dimenšuvnnat vealaheami, sihke sohkabeali ektu
i ja čearddalašvuođa ektui.' | bash smegramj.mode |jq .
{
"errs": [
[
"Álgoálbmot nissonat",
0,
19,
"msyn-compound",
"\"Álgoálbmot nissonat\" orru leamen goallossátni",
[
"Álgoálbmotnissonat"
],
"Goallosteapmi"
]
],
"text": "Álgoálbmot nissonat vásihit dávjá máŋgga dimenšuvnnat vealaheami, sihke sohkabeali ektui ja čearddalašvuođa ektui.\n"
}
just not in the CG output. Tommi's fix changes it for CG, though the CG output is still kind of "low-level", since there's more that happens to suggestions in the json outputs (e.g. expanding underlines based on relations). Since that happens only when a full sentence is processed, perhaps we could output the full suggestions in a "comment" after the period in CG outputs, maybe something like
[…other cohorts…]
"
:\n
Works with current smegram
This issue was created automatically with bugzilla2github
Bugzilla Bug 2712
Date: 2020-12-18T11:43:49+01:00 From: Linda Wiechetek <>
To: Tommi A Pirinen <>
CC: linda.wiechetek, sjur.n.moshagen, trond.trosterud, unhammer+apertium
Last updated: 2021-02-16T21:23:40+01:00