divvun / libdivvun

lib for running gramcheck and other pipelines + cli; modules for CG→spelling, CG→feedback, tagging blanks
https://giellalt.github.io/proof/gramcheck/GrammarCheckerDocumentation.html
GNU General Public License v3.0
9 stars 1 forks source link

Error detected by smegram-dev.mode is not detected by divvun-checker #67

Closed albbas closed 8 months ago

albbas commented 8 months ago
❯ echo dego álgodiehtun. | divvun-checker -a tools/grammarcheckers/se.zcheck -n smegram
{"errs":[],"text":"dego álgodiehtun."}
❯ echo dego álgodiehtun. | tools/grammarcheckers/modes/smegram-dev.mode | less -R
"<dego>"
        "dego" Adv <W:0.0> <firstCohort> &LINK &syn-not-dego ID:1
syn-not-dego
: 
"<álgodiehtun>"         dego álgodiehtun        🖝  álgodiehtun
        "álgodiehtu" N Sem/Prod-cogn_Txt Ess <W:0.0> <cohort-with-dynamic-compound> <cohort-with-dynamic-compound> @COMP-CS< &syn-not-dego ID:2 R:DELETE1:1 R:$2:1
syn-not-dego
        "álgodiehtu" N Sem/Prod-cogn_Txt Sg Acc Err/Orth PxSg1 <W:0.0> <cohort-with-dynamic-compound> <cohort-with-dynamic-compound> @COMP-CS< ID:2 R:DELETE1:1 R:$2:1
        "álgodiehtu" N Sem/Prod-cogn_Txt Sg Gen Err/Orth PxSg1 <W:0.0> <cohort-with-dynamic-compound> <cohort-with-dynamic-compound> @COMP-CS< ID:2 R:DELETE1:1 R:$2:1
        "diehtu" N <TH-birra-Any> Sem/Prod-cogn_Txt Ess <W:0.0> <cohort-with-dynamic-compound> <cohort-with-dynamic-compound> @COMP-CS< &syn-not-dego ID:2 R:DELETE1:1 R:$2:1
                "álgu" N Sem/Time Cmp/SgNom Cmp <W:0.0> ID:2 R:DELETE1:1 R:$2:1
syn-not-dego
"<.>"
        "." CLB <W:0.0> <LastCohort>
:\n
albbas commented 8 months ago

This is on a Mac divvun-checker --version divvun-checker - libdivvun version 0.3.11-alpha

albbas commented 8 months ago

Probably a duplicate of giellalt/lang-smj#38

unhammer commented 8 months ago

Hm, not so sure it's a dupe; there is a relation towards word 1 here.

What does

sed 's,divvun-suggest,& -j,g' modes/smegram-dev.mode >modes/smegram-dev-j.mode
echo dego álgodiehtun. |bash modes/smegram-dev-j.mode

give?

divvun-checker should be using the same method as divvun-suggest -j/--json (and plain divvun-suggest in CG format also now uses most of the same code as --json). I tried removing the suggestions from your CG output to fake the output of the step before suggest, that gave me:

$ cat /tmp/dego.txt
"<dego>"
    "dego" Adv <W:0.0> <firstCohort> &LINK &syn-not-dego ID:1
:
"<álgodiehtun>"
    "álgodiehtu" N Sem/Prod-cogn_Txt Ess <W:0.0> <cohort-with-dynamic-compound> <cohort-with-dynamic-compound> @COMP-CS< &syn-not-dego ID:2 R:DELETE1:1 R:$2:1
    "álgodiehtu" N Sem/Prod-cogn_Txt Sg Acc Err/Orth PxSg1 <W:0.0> <cohort-with-dynamic-compound> <cohort-with-dynamic-compound> @COMP-CS< ID:2 R:DELETE1:1 R:$2:1
    "álgodiehtu" N Sem/Prod-cogn_Txt Sg Gen Err/Orth PxSg1 <W:0.0> <cohort-with-dynamic-compound> <cohort-with-dynamic-compound> @COMP-CS< ID:2 R:DELETE1:1 R:$2:1
    "diehtu" N <TH-birra-Any> Sem/Prod-cogn_Txt Ess <W:0.0> <cohort-with-dynamic-compound> <cohort-with-dynamic-compound> @COMP-CS< &syn-not-dego ID:2 R:DELETE1:1 R:$2:1
        "álgu" N Sem/Time Cmp/SgNom Cmp <W:0.0> ID:2 R:DELETE1:1 R:$2:1
"<.>"
    "." CLB <W:0.0> <LastCohort>
:\n

$ cat /tmp/dego.txt | ./divvun-suggest -j /usr/share/giella/sme/tokeniser-gramcheck-gt-desc.pmhfst |jq
libdivvun: WARNING: >1 transducers in stream! Only using the first.
./divvun-suggest WARNING: no errors.xml argument; tags used as error messages.
divvun-suggest: WARNING: No <description> for "syn-not-dego" in any xml:lang
{
  "errs": [
    [
      "degoálgodiehtun",
      0,
      15,
      "syn-not-dego",
      "syn-not-dego",
      [
        "álgodiehtun"
      ],
      "syn-not-dego"
    ]
  ],
  "text": "degoálgodiehtun.\n"
}
albbas commented 8 months ago

Here's my result:

❯ sed 's,divvun-suggest,& -j,g' modes/smegram-dev.mode >modes/smegram-dev-j.mode 
❯ echo dego álgodiehtun. |bash modes/smegram-dev-j.mode|jq .
{
  "errs": [
    [
      "dego álgodiehtun",
      0,
      16,
      "syn-not-dego",
      "ii galgga leat \"dego\"",
      [
        "álgodiehtun"
      ],
      "Cealkkameattáhus"
    ]
  ],
  "text": "dego álgodiehtun.\n"
}
unhammer commented 8 months ago

OK, so the error only exists when run as a zcheck file with checker, odd.

Can you upload the se.zcheck here or on gtweb?

albbas commented 8 months ago

It's on gtweb now, my home directory.

unhammer commented 8 months ago
$ rsync gtweb:se.zcheck .
$ mkdir modes
$ unzip -q se.zcheck
$ divvun-gen-sh -d modes pipespec.xml >/dev/null
$ echo dego álgodiehtun. |bash modes/smegram.mode
"<dego>"
        "dego" Adv <W:0.0> <firstCohort>
:
"<álgodiehtun>"
        "álgodiehtu" N Sem/Prod-cogn_Txt Ess <W:0.0> <cohort-with-dynamic-compound> <cohort-with-dynamic-compound> @COMP-CS<
        "álgodiehtu" N Sem/Prod-cogn_Txt Sg Acc Err/Orth PxSg1 <W:0.0> <cohort-with-dynamic-compound> <cohort-with-dynamic-compound> @COMP-CS<
        "álgodiehtu" N Sem/Prod-cogn_Txt Sg Gen Err/Orth PxSg1 <W:0.0> <cohort-with-dynamic-compound> <cohort-with-dynamic-compound> @COMP-CS<
        "diehtu" N <TH-birra-Any> Sem/Prod-cogn_Txt Ess <W:0.0> <cohort-with-dynamic-compound> <cohort-with-dynamic-compound> @COMP-CS<
                "álgu" N Sem/Time Cmp/SgNom Cmp <W:0.0>
"<.>"
        "." CLB <W:0.0> <LastCohort>
:\n

Her ser det ut til å vera ein forskjell i output mellom smegram og smegram-dev.

(Viss eg manuelt legg på vislcg -g ~/src/uit-langtech/lang-sme/tools/grammarcheckers/grammarchecker.cg3 på slutten så får eg feiltaggane – korleis er grammarchecker-release.bin bygd, blir noko fjerna frå den?)

albbas commented 8 months ago

Er det ikke slik at smegram og smegram-dev skal gi samme resultat? Forskjellen skal bare være at smegram-dev bruker .cg3-filer fra filsystemet, og smegram bruker kompilerte cg3-filer som er pakket inn i .zhfst-fila, trodde jeg?

unhammer commented 8 months ago

Eg ser frå pipespec.xml.in at dei bruker ulike filer, smegram-dev bruker grammarchecker.cg3 mens smegram bruker grammarchecker-release.bin

ser òg dette i Makefile.am:


30-
31-if HAVE_VISLCG_FILTER
32:grammarchecker-release.bin: grammarchecker.cg3
33-     $(AM_V_CGCOMP)"$(VISLCG3)" --grammar $< --grammar-bin $@ --grammar-only --nrules-v "^x"
34-else
35:grammarchecker-release.cg3: $(srcdir)/grammarchecker.cg3 \
36:                                                     $(GIELLA_CORE)/scripts/gc-release.awk
37:     $(AM_V_GEN)$(GAWK) -f $(GIELLA_CORE)/scripts/gc-release.awk $< > $@
38-
unhammer commented 8 months ago

--nrules-v a regex for which rule names not to parse/run

unhammer commented 8 months ago
$ echo dego álgodiehtun. |bash modes/smegram.mode|vislcg3 -g ~/src/uit-langtech/lang-sme/tools/grammarcheckers/grammarchecker.cg3 -t
"<dego>"
        "dego" Adv <W:0.0> <firstCohort> &LINK &syn-not-dego ID:1 ADD:9871:xsyn-not-dego ADD:9871:xsyn-not-dego
:
"<álgodiehtun>"
        "álgodiehtu" N Sem/Prod-cogn_Txt Ess <W:0.0> <cohort-with-dynamic-compound> <cohort-with-dynamic-compound> @COMP-CS< &syn-not-dego ID:2 R:DELETE1:1 R:$2:1 ADD:9872:xsyn-not-dego ADDRELATION(DELETE1):9873:xsyn-not-dego-2 COPY:9874:syn-dego-ess ADD:9872:xsyn-not-dego COPY:9874:syn-dego-ess
        "álgodiehtu" N Sem/Prod-cogn_Txt <W:0.0> <cohort-with-dynamic-compound> <cohort-with-dynamic-compound> @COMP-CS< Sg Nom &syn-dego-nom ID:2 R:DELETE1:1 R:$2:1 ADD:9872:xsyn-not-dego ADDRELATION(DELETE1):9873:xsyn-not-dego-2 COPY:9874:syn-dego-ess ADD:9872:xsyn-not-dego COPY:9874:syn-dego-ess
        "álgodiehtu" N Sem/Prod-cogn_Txt <W:0.0> <cohort-with-dynamic-compound> <cohort-with-dynamic-compound> @COMP-CS< Sg Nom &syn-dego-nom &SUGGEST ID:2 R:DELETE1:1 R:$2:1 ADD:9872:xsyn-not-dego ADDRELATION(DELETE1):9873:xsyn-not-dego-2 COPY:9874:syn-dego-ess
        "álgodiehtu" N Sem/Prod-cogn_Txt Sg Acc Err/Orth PxSg1 <W:0.0> <cohort-with-dynamic-compound> <cohort-with-dynamic-compound> @COMP-CS< ID:2 R:DELETE1:1 R:$2:1
        "álgodiehtu" N Sem/Prod-cogn_Txt Sg Gen Err/Orth PxSg1 <W:0.0> <cohort-with-dynamic-compound> <cohort-with-dynamic-compound> @COMP-CS< ID:2 R:DELETE1:1 R:$2:1
        "diehtu" N <TH-birra-Any> Sem/Prod-cogn_Txt Ess <W:0.0> <cohort-with-dynamic-compound> <cohort-with-dynamic-compound> @COMP-CS< &syn-not-dego ID:2 R:DELETE1:1 R:$2:1 ADD:9872:xsyn-not-dego COPY:9874:syn-dego-ess ADD:9872:xsyn-not-dego COPY:9874:syn-dego-ess
                "álgu" N Sem/Time Cmp/SgNom Cmp <W:0.0> ID:2 R:DELETE1:1 R:$2:1
        "diehtu" N <TH-birra-Any> Sem/Prod-cogn_Txt <W:0.0> <cohort-with-dynamic-compound> <cohort-with-dynamic-compound> @COMP-CS< Sg Nom &syn-dego-nom ID:2 R:DELETE1:1 R:$2:1 ADD:9872:xsyn-not-dego COPY:9874:syn-dego-ess ADD:9872:xsyn-not-dego COPY:9874:syn-dego-ess
                "álgu" N Sem/Time Cmp/SgNom Cmp <W:0.0> ID:2 R:DELETE1:1 R:$2:1
        "diehtu" N <TH-birra-Any> Sem/Prod-cogn_Txt <W:0.0> <cohort-with-dynamic-compound> <cohort-with-dynamic-compound> @COMP-CS< Sg Nom &syn-dego-nom &SUGGEST ID:2 R:DELETE1:1 R:$2:1 ADD:9872:xsyn-not-dego COPY:9874:syn-dego-ess
                "álgu" N Sem/Time Cmp/SgNom Cmp <W:0.0> ID:2 R:DELETE1:1 R:$2:1
"<.>"
        "." CLB <W:0.0> <LastCohort>
:\n

og dei reglane som køyrer, ADD:9872:xsyn-not-dego, har x først i namnet sitt, som betyr at dei blir droppa frå grammarchecker-release.


Det som gjer det heile meir forvirrande er at det finst to pipes som heiter smegram og smegramrelease, men begge køyrer grammarchecker-release (eg hadde antatt at den som mangla -release frå pipenamnet køyrte cg-en som ikkje har -release i namnet)

albbas commented 8 months ago

Det er nok bare å endre det, slik at se.zcheck fungerer som forventet.

snomos commented 8 months ago

Samd med @albbas men vi bør sjekka med @lynnda-hill

unhammer commented 8 months ago

@lynnda-hill var òg samd, men sa me måtte sjekka at eventuelle yaml-testar som testar sånt som skal til brukarane må bruka moden smegramrelease

unhammer commented 8 months ago

Eg veit ikkje korleis yaml-modes er definert, men kan endra den pipespec-en umiddelbart; viss fleire testar plutseleg feilar så blir det vel tydeleg viss systemet fungerer som det skal :)

Men: kjem grammarchecker.bin med i pakka? eg ser fila manglar frå den se.zcheck-fila, så det må vel endrast fleire stader

albbas commented 8 months ago

This commit fixes this issue