Closed albbas closed 8 months ago
This is on a Mac divvun-checker --version divvun-checker - libdivvun version 0.3.11-alpha
Probably a duplicate of giellalt/lang-smj#38
Hm, not so sure it's a dupe; there is a relation towards word 1 here.
What does
sed 's,divvun-suggest,& -j,g' modes/smegram-dev.mode >modes/smegram-dev-j.mode
echo dego álgodiehtun. |bash modes/smegram-dev-j.mode
give?
divvun-checker should be using the same method as divvun-suggest -j/--json
(and plain divvun-suggest
in CG format also now uses most of the same code as --json
). I tried removing the suggestions from your CG output to fake the output of the step before suggest, that gave me:
$ cat /tmp/dego.txt
"<dego>"
"dego" Adv <W:0.0> <firstCohort> &LINK &syn-not-dego ID:1
:
"<álgodiehtun>"
"álgodiehtu" N Sem/Prod-cogn_Txt Ess <W:0.0> <cohort-with-dynamic-compound> <cohort-with-dynamic-compound> @COMP-CS< &syn-not-dego ID:2 R:DELETE1:1 R:$2:1
"álgodiehtu" N Sem/Prod-cogn_Txt Sg Acc Err/Orth PxSg1 <W:0.0> <cohort-with-dynamic-compound> <cohort-with-dynamic-compound> @COMP-CS< ID:2 R:DELETE1:1 R:$2:1
"álgodiehtu" N Sem/Prod-cogn_Txt Sg Gen Err/Orth PxSg1 <W:0.0> <cohort-with-dynamic-compound> <cohort-with-dynamic-compound> @COMP-CS< ID:2 R:DELETE1:1 R:$2:1
"diehtu" N <TH-birra-Any> Sem/Prod-cogn_Txt Ess <W:0.0> <cohort-with-dynamic-compound> <cohort-with-dynamic-compound> @COMP-CS< &syn-not-dego ID:2 R:DELETE1:1 R:$2:1
"álgu" N Sem/Time Cmp/SgNom Cmp <W:0.0> ID:2 R:DELETE1:1 R:$2:1
"<.>"
"." CLB <W:0.0> <LastCohort>
:\n
$ cat /tmp/dego.txt | ./divvun-suggest -j /usr/share/giella/sme/tokeniser-gramcheck-gt-desc.pmhfst |jq
libdivvun: WARNING: >1 transducers in stream! Only using the first.
./divvun-suggest WARNING: no errors.xml argument; tags used as error messages.
divvun-suggest: WARNING: No <description> for "syn-not-dego" in any xml:lang
{
"errs": [
[
"degoálgodiehtun",
0,
15,
"syn-not-dego",
"syn-not-dego",
[
"álgodiehtun"
],
"syn-not-dego"
]
],
"text": "degoálgodiehtun.\n"
}
Here's my result:
❯ sed 's,divvun-suggest,& -j,g' modes/smegram-dev.mode >modes/smegram-dev-j.mode
❯ echo dego álgodiehtun. |bash modes/smegram-dev-j.mode|jq .
{
"errs": [
[
"dego álgodiehtun",
0,
16,
"syn-not-dego",
"ii galgga leat \"dego\"",
[
"álgodiehtun"
],
"Cealkkameattáhus"
]
],
"text": "dego álgodiehtun.\n"
}
OK, so the error only exists when run as a zcheck file with checker, odd.
Can you upload the se.zcheck here or on gtweb?
It's on gtweb now, my home directory.
$ rsync gtweb:se.zcheck .
$ mkdir modes
$ unzip -q se.zcheck
$ divvun-gen-sh -d modes pipespec.xml >/dev/null
$ echo dego álgodiehtun. |bash modes/smegram.mode
"<dego>"
"dego" Adv <W:0.0> <firstCohort>
:
"<álgodiehtun>"
"álgodiehtu" N Sem/Prod-cogn_Txt Ess <W:0.0> <cohort-with-dynamic-compound> <cohort-with-dynamic-compound> @COMP-CS<
"álgodiehtu" N Sem/Prod-cogn_Txt Sg Acc Err/Orth PxSg1 <W:0.0> <cohort-with-dynamic-compound> <cohort-with-dynamic-compound> @COMP-CS<
"álgodiehtu" N Sem/Prod-cogn_Txt Sg Gen Err/Orth PxSg1 <W:0.0> <cohort-with-dynamic-compound> <cohort-with-dynamic-compound> @COMP-CS<
"diehtu" N <TH-birra-Any> Sem/Prod-cogn_Txt Ess <W:0.0> <cohort-with-dynamic-compound> <cohort-with-dynamic-compound> @COMP-CS<
"álgu" N Sem/Time Cmp/SgNom Cmp <W:0.0>
"<.>"
"." CLB <W:0.0> <LastCohort>
:\n
Her ser det ut til å vera ein forskjell i output mellom smegram og smegram-dev.
(Viss eg manuelt legg på vislcg -g ~/src/uit-langtech/lang-sme/tools/grammarcheckers/grammarchecker.cg3
på slutten så får eg feiltaggane – korleis er grammarchecker-release.bin
bygd, blir noko fjerna frå den?)
Er det ikke slik at smegram og smegram-dev skal gi samme resultat? Forskjellen skal bare være at smegram-dev
bruker .cg3-filer fra filsystemet, og smegram
bruker kompilerte cg3-filer som er pakket inn i .zhfst-fila, trodde jeg?
Eg ser frå pipespec.xml.in at dei bruker ulike filer, smegram-dev bruker grammarchecker.cg3
mens smegram bruker grammarchecker-release.bin
ser òg dette i Makefile.am:
30-
31-if HAVE_VISLCG_FILTER
32:grammarchecker-release.bin: grammarchecker.cg3
33- $(AM_V_CGCOMP)"$(VISLCG3)" --grammar $< --grammar-bin $@ --grammar-only --nrules-v "^x"
34-else
35:grammarchecker-release.cg3: $(srcdir)/grammarchecker.cg3 \
36: $(GIELLA_CORE)/scripts/gc-release.awk
37: $(AM_V_GEN)$(GAWK) -f $(GIELLA_CORE)/scripts/gc-release.awk $< > $@
38-
--nrules-v a regex for which rule names not to parse/run
$ echo dego álgodiehtun. |bash modes/smegram.mode|vislcg3 -g ~/src/uit-langtech/lang-sme/tools/grammarcheckers/grammarchecker.cg3 -t
"<dego>"
"dego" Adv <W:0.0> <firstCohort> &LINK &syn-not-dego ID:1 ADD:9871:xsyn-not-dego ADD:9871:xsyn-not-dego
:
"<álgodiehtun>"
"álgodiehtu" N Sem/Prod-cogn_Txt Ess <W:0.0> <cohort-with-dynamic-compound> <cohort-with-dynamic-compound> @COMP-CS< &syn-not-dego ID:2 R:DELETE1:1 R:$2:1 ADD:9872:xsyn-not-dego ADDRELATION(DELETE1):9873:xsyn-not-dego-2 COPY:9874:syn-dego-ess ADD:9872:xsyn-not-dego COPY:9874:syn-dego-ess
"álgodiehtu" N Sem/Prod-cogn_Txt <W:0.0> <cohort-with-dynamic-compound> <cohort-with-dynamic-compound> @COMP-CS< Sg Nom &syn-dego-nom ID:2 R:DELETE1:1 R:$2:1 ADD:9872:xsyn-not-dego ADDRELATION(DELETE1):9873:xsyn-not-dego-2 COPY:9874:syn-dego-ess ADD:9872:xsyn-not-dego COPY:9874:syn-dego-ess
"álgodiehtu" N Sem/Prod-cogn_Txt <W:0.0> <cohort-with-dynamic-compound> <cohort-with-dynamic-compound> @COMP-CS< Sg Nom &syn-dego-nom &SUGGEST ID:2 R:DELETE1:1 R:$2:1 ADD:9872:xsyn-not-dego ADDRELATION(DELETE1):9873:xsyn-not-dego-2 COPY:9874:syn-dego-ess
"álgodiehtu" N Sem/Prod-cogn_Txt Sg Acc Err/Orth PxSg1 <W:0.0> <cohort-with-dynamic-compound> <cohort-with-dynamic-compound> @COMP-CS< ID:2 R:DELETE1:1 R:$2:1
"álgodiehtu" N Sem/Prod-cogn_Txt Sg Gen Err/Orth PxSg1 <W:0.0> <cohort-with-dynamic-compound> <cohort-with-dynamic-compound> @COMP-CS< ID:2 R:DELETE1:1 R:$2:1
"diehtu" N <TH-birra-Any> Sem/Prod-cogn_Txt Ess <W:0.0> <cohort-with-dynamic-compound> <cohort-with-dynamic-compound> @COMP-CS< &syn-not-dego ID:2 R:DELETE1:1 R:$2:1 ADD:9872:xsyn-not-dego COPY:9874:syn-dego-ess ADD:9872:xsyn-not-dego COPY:9874:syn-dego-ess
"álgu" N Sem/Time Cmp/SgNom Cmp <W:0.0> ID:2 R:DELETE1:1 R:$2:1
"diehtu" N <TH-birra-Any> Sem/Prod-cogn_Txt <W:0.0> <cohort-with-dynamic-compound> <cohort-with-dynamic-compound> @COMP-CS< Sg Nom &syn-dego-nom ID:2 R:DELETE1:1 R:$2:1 ADD:9872:xsyn-not-dego COPY:9874:syn-dego-ess ADD:9872:xsyn-not-dego COPY:9874:syn-dego-ess
"álgu" N Sem/Time Cmp/SgNom Cmp <W:0.0> ID:2 R:DELETE1:1 R:$2:1
"diehtu" N <TH-birra-Any> Sem/Prod-cogn_Txt <W:0.0> <cohort-with-dynamic-compound> <cohort-with-dynamic-compound> @COMP-CS< Sg Nom &syn-dego-nom &SUGGEST ID:2 R:DELETE1:1 R:$2:1 ADD:9872:xsyn-not-dego COPY:9874:syn-dego-ess
"álgu" N Sem/Time Cmp/SgNom Cmp <W:0.0> ID:2 R:DELETE1:1 R:$2:1
"<.>"
"." CLB <W:0.0> <LastCohort>
:\n
og dei reglane som køyrer, ADD:9872:xsyn-not-dego, har x
først i namnet sitt, som betyr at dei blir droppa frå grammarchecker-release.
Det som gjer det heile meir forvirrande er at det finst to pipes som heiter smegram og smegramrelease, men begge køyrer grammarchecker-release (eg hadde antatt at den som mangla -release
frå pipenamnet køyrte cg-en som ikkje har -release
i namnet)
Det er nok bare å endre det, slik at se.zcheck fungerer som forventet.
Samd med @albbas men vi bør sjekka med @lynnda-hill
@lynnda-hill var òg samd, men sa me måtte sjekka at eventuelle yaml-testar som testar sånt som skal til brukarane må bruka moden smegramrelease
Eg veit ikkje korleis yaml-modes er definert, men kan endra den pipespec-en umiddelbart; viss fleire testar plutseleg feilar så blir det vel tydeleg viss systemet fungerer som det skal :)
Men: kjem grammarchecker.bin
med i pakka? eg ser fila manglar frå den se.zcheck-fila, så det må vel endrast fleire stader
This commit fixes this issue