giellalt / bugzilla-dummy

0 stars 0 forks source link

suggesting words with first captialized letters does not seem to work (Bugzilla Bug 2712) #1777

Closed albbas closed 2 months ago

albbas commented 3 years ago

This issue was created automatically with bugzilla2github

Bugzilla Bug 2712

Date: 2020-12-18T11:43:49+01:00 From: Linda Wiechetek <> To: Tommi A Pirinen <> CC: linda.wiechetek, sjur.n.moshagen, trond.trosterud, unhammer+apertium

Last updated: 2021-02-16T21:23:40+01:00

albbas commented 3 years ago

Comment 14179

Date: 2020-12-18 11:43:49 +0100 From: Linda Wiechetek <>

Here we have a compound error (Álgoálbmot nissonat) should be written as one word, but the correction does not produce a capitalized first letter. What can we do?

Here is the sentence:

Álgoálbmot nissonat vásihit dávjá máŋgga dimenšuvnnat vealaheami, sihke sohkabeali ektui ja čearddalašvuođa ektui.

"<Álgoálbmot nissonat>" "álgoálbmotnisu" v1 N Sem/Hum Pl Nom Err/SpaceCmp SELECT:2993:fallback SELECT:2993:fallback @SUBJ> SELECT:18046:r2339 SELECT:6094:r1025 MAP:23614 &msyn-compound #1->1 ADD:3824:msyn-compound ADD:3824:msyn-compound ADD:3824:msyn-compound msyn-compound "álgoálbmotnisu" v1 N Sem/Hum Pl Nom SELECT:2993:fallback SELECT:2993:fallback @SUBJ> SELECT:18046:r2339 SELECT:6094:r1025 MAP:23614 &SUGGEST #1->1 ADD:3824:msyn-compound ADD:3824:msyn-compound COPY:3842:compound álgoálbmotnisu+v1+N+Pl+Nom álgoálbmotnissonat "álgoálbmotnisu" v1 N Sem/Hum Pl Nom SELECT:2993:fallback SELECT:2993:fallback @SUBJ> SELECT:18046:r2339 SELECT:6094:r1025 MAP:23614 &SUGGEST #1->1 ADD:3824:msyn-compound COPY:3842:compound álgoálbmotnisu+v1+N+Pl+Nom álgoálbmotnissonat "álgoálbmotnisu" v1 N Sem/Hum Pl Nom SELECT:2993:fallback SELECT:2993:fallback @SUBJ> SELECT:18046:r2339 SELECT:6094:r1025 MAP:23614 &SUGGEST #1->1 ADD:3824:msyn-compound ADD:3824:msyn-compound ADD:3824:msyn-compound COPY:3842:compound álgoálbmotnisu+v1+N+Pl+Nom álgoálbmotnissonat "álgoálbmotnisu" v2 N Sem/Hum Pl Nom Err/SpaceCmp SELECT:2993:fallback SELECT:2993:fallback @SUBJ> SELECT:18046:r2339 SELECT:6094:r1025 MAP:23614 &msyn-compound #1->1 ADD:3824:msyn-compound ADD:3824:msyn-compound ADD:3824:msyn-compound msyn-compound "álgoálbmotnisu" v2 N Sem/Hum Pl Nom SELECT:2993:fallback SELECT:2993:fallback @SUBJ> SELECT:18046:r2339 SELECT:6094:r1025 MAP:23614 &SUGGEST #1->1 ADD:3824:msyn-compound COPY:3842:compound álgoálbmotnisu+v2+N+Pl+Nom álgoálbmotnissonat "álgoálbmotnisu" v2 N Sem/Hum Pl Nom SELECT:2993:fallback SELECT:2993:fallback @SUBJ> SELECT:18046:r2339 SELECT:6094:r1025 MAP:23614 &SUGGEST #1->1 ADD:3824:msyn-compound ADD:3824:msyn-compound ADD:3824:msyn-compound COPY:3842:compound álgoálbmotnisu+v2+N+Pl+Nom álgoálbmotnissonat "álgoálbmotnisu" v2 N Sem/Hum Pl Nom SELECT:2993:fallback SELECT:2993:fallback @SUBJ> SELECT:18046:r2339 SELECT:6094:r1025 MAP:23614 &SUGGEST #1->1 ADD:3824:msyn-compound ADD:3824:msyn-compound COPY:3842:compound álgoálbmotnisu+v2+N+Pl+Nom álgoálbmotnissonat ; "álgoálbmotnisu" v1 N Sem/Hum Sg Acc PxSg2 Err/SpaceCmp SELECT:2993:fallback SELECT:2993:fallback SELECT:18046:r2339 ; "álgoálbmotnisu" v1 N Sem/Hum Sg Gen PxSg2 Err/SpaceCmp SELECT:2993:fallback SELECT:2993:fallback REMOVE:17789:r2285

albbas commented 3 years ago

Comment 14208

Date: 2021-02-16 14:34:51 +0100 From: Tommi A Pirinen <>

I tried editing divvun-suggest but not sure if correct place so waiting for Kevin's feedback i github:

echo Álgoálbmot nissonat vásihit dávjá máŋgga dimenšuvnnat vealaheami, sihke sohkabeali ektui ja čearddalašvuođa ektui. | hfst-tokenise -g tokeniser-gramcheck-gt-desc.pmhfst | divvun-blanktag analyser-gt-whitespace.hfst | vislcg3 --trace -g valency.bin | vislcg3 --trace -g mwe-dis.bin | cg-mwesplit | divvun-blanktag analyser-gt-errorwhitespace.hfst | divvun-cgspell -n 10 -b 15 -w 5000 -u 0.4 -l acceptor.default.hfst -m errmodel.default.hfst | vislcg3 --trace -g grc-disambiguator.bin | vislcg3 --trace -g spellchecker.bin | vislcg3 --trace -g after-speller-disambiguator.bin | vislcg3 --trace -g grammarchecker.cg3 | ~/github/divvun/libdivvun/src/divvun-suggest -g generator-gramcheck-gt-norm.hfstol -m errors.xml | head -n 75 "<Álgoálbmot nissonat>" "álgoálbmotnisu" v1 N Sem/Hum Pl Nom Err/SpaceCmp SELECT:3006:fallback SELECT:3006:fallback SELECT:18468:r2339 SELECT:6513:r1025 MAP:23650 @SUBJ> &msyn-compound #1->1 ADD:10043:compound msyn-compound "álgoálbmotnisu" v1 N Sem/Hum Pl Nom SELECT:3006:fallback SELECT:3006:fallback SELECT:18468:r2339 SELECT:6513:r1025 MAP:23650 @SUBJ> &SUGGEST #1->1 ADD:10043:compound COPY:10052:compound álgoálbmotnisu+v1+N+Pl+Nom Álgoálbmotnissonat "álgoálbmotnisu" v2 N Sem/Hum Pl Nom Err/SpaceCmp SELECT:3006:fallback SELECT:3006:fallback SELECT:18468:r2339 SELECT:6513:r1025 MAP:23650 @SUBJ> &msyn-compound #1->1 ADD:10043:compound msyn-compound "álgoálbmotnisu" v2 N Sem/Hum Pl Nom SELECT:3006:fallback SELECT:3006:fallback SELECT:18468:r2339 SELECT:6513:r1025 MAP:23650 @SUBJ> &SUGGEST #1->1 ADD:10043:compound COPY:10052:compound álgoálbmotnisu+v2+N+Pl+Nom Álgoálbmotnissonat

it's possible it breaks everything else...

albbas commented 3 years ago

Comment 14209

Date: 2021-02-16 18:06:01 +0100 From: Linda Wiechetek <>

Great! And oh no.. I would like to have Sjur's comment's on this as well.

albbas commented 3 years ago

Comment 14210

Date: 2021-02-16 21:23:40 +0100 From: Kevin Brubeck Unhammer <<unhammer+apertium>>

Casing was already applied in the json and library modes (used by web, LO):

$ echo 'Álgoálbmot nissonat vásihit dávjá máŋgga dimenšuvnnat vealaheami, sihke sohkabeali ektu i ja čearddalašvuođa ektui.' | bash smegramj.mode |jq . {
"errs": [
[
"Álgoálbmot nissonat",
0,
19,
"msyn-compound",
"\"Álgoálbmot nissonat\" orru leamen goallossátni", [
"Álgoálbmotnissonat" ],
"Goallosteapmi"
]
],
"text": "Álgoálbmot nissonat vásihit dávjá máŋgga dimenšuvnnat vealaheami, sihke sohkabeali ektui ja čearddalašvuođa ektui.\n"
}

just not in the CG output. Tommi's fix changes it for CG, though the CG output is still kind of "low-level", since there's more that happens to suggestions in the json outputs (e.g. expanding underlines based on relations). Since that happens only when a full sentence is processed, perhaps we could output the full suggestions in a "comment" after the period in CG outputs, maybe something like

[…other cohorts…] "" "eaktu" §DE N Sem/Dummytag Sg Ill @<ADVL #13->13 "ektui" Po @<ADVL #13->13 "<.>" "." CLB #14->14

REPS: {"Álgoálbmot nissonat": "Álgoálbmotnissonat"}

:\n

albbas commented 2 months ago

Works with current smegram