giellalt / lang-sme

Finite state and Constraint Grammar based analysers and proofing tools, and language resources for the Northern Sami language
GNU General Public License v3.0
6 stars 1 forks source link

Marks particle as error instead of the preceding Err/Orth of the same mwe #45

Open duomdaamaendra opened 2 years ago

duomdaamaendra commented 2 years ago
Skjermbilde 2022-01-28 kl  04 17 25

(↑ is , ↓ is this issue)

duomdaamaendra commented 2 years ago
Skjermbilde 2022-01-28 kl  04 25 50

this example is from erroneous word (correct: "-diehtagis"), but the marked part is korrekt: it is the not-enclitical particle "gis"

snomos commented 2 years ago

In both cases I need the original text to be able to reproduce and debug. The paragraph containing the problem should be enough, maybe even just the sentence.

duomdaamaendra commented 2 years ago

Jos dal vel Sámis leat sullasaš dilit go davviriikkain muđuid, de fuobmá árvvoštallamiin goit ovtta erenoamáš ášši mii earuha sámi árvvoštallamiid omd. dáža árvvoštallamiin. Girječálli birra, ja su ođđa girji ovddeš bargguiguin veardádallon, gávnnat hárve sámi árvvoštallamiin. Čiekŋaleabbo dieđu go ahte gos čálli lea riegádan ja gos ássá, gávnnat hárve. Oalle dábálaš lea dákkár diehtu lohkkái: «Mus eai leat obanassii sánitge rámidit nn čehppodaga, dajan dušše ahte áŋgirit ja čeahpit gultturbargi ii gávnna ohcaminge.» (Samefolket 1/89, s. 92). Fuobmá maiddái dán čállosa ovdamearkka vuosttas siiddus: «- rohkkes Láhpoluobbala gollenieida …» Čállái báhcá goit rápmi, jos dal ii čiekŋalit ággaduvvon.

duomdaamaendra commented 2 years ago

Jos ohcala siva dasa ahte čálli birra leat uhcán dieđut, de vástádus dáidá leat nu álki go ahte nie šaddá lunddolaččat servodagasgos [116] buohkat dovddadit. Árvvoštalli duhtá dasto daid dábálaš dieđuide mat juo buohkain leat čálli birra. Dás han maiddái lea sáhka dušše ođđa girjjiin, ja otnáš čálliin. Ii árvvoštalli arvva lebbet dieđuid čálli eallimis omd., jos dal vel oaivvildeš ahte leat leamaš váikkuheaddji áššit čálli loahpalaš bargui, go dát sáhttet leat oalle persovnnalaččat ja dan sivas eai gula almmolašvuhtii. Sáhttá gal maiddái árvvoštalli leat oaivvildeamen ahte eat dárbbaš dárkilieabbo dieđuid go mat mis juo buohkain leat. Dán dili bahá ja buriid beliid garvván dán oktavuođas. Muhto dattege, jos vuos dieavaslaččat áigut árvvoštallat sámi girjjiid, de fertet árvvoštaladettiin maiddái ohcalit ja čállit biográfalaš dieđuid, muhto dieđusge dakkár dieđuid mat leat relevánta. Geaid luhtte čálli lea ijastallan maŋemuš jagiid, ja galle luovosmáná sus leat, eai leat eanemus relevánta dieđut. Dattege sáhttet leat čálli birrasis dakkár váikkuheaddji elementtat maid birra sáhtášii leat dehálaš diehtit. Ahte čálli eallimis sáhttet leat váikkuheaddji olbmot, dáhpáhusat ja fearánat mat leat váikkuhan su go girjji čálii, leat girjjálašvuođadiehttagis dohkkehan dutkanveara áššin. Historjjábiográfalaš árvvoštallama vuogis dát lea guovddáš ášši, earret dieđusge teaksta dahje girji maid čálli almmuha.

duomdaamaendra commented 2 years ago

the same in Googledocs

snomos commented 2 years ago

The first case, gavnnat / nn, seems to be the same as this bug. That is, this bug is not restricted to GDocs.

snomos commented 2 years ago

The second example, the gis bug, is most likely on our end, but needs further investigation.

duomdaamaendra commented 2 years ago


lynnda-hill commented 2 years ago
Skjermbilde 2022-01-28 kl 04 25 50

this example is from erroneous word (correct: "-diehtagis"), but the marked part is korrekt: it is the not-enclitical particle "gis"

"<girjjálašvuođadiehtta>" "girji" Ex/N Sem/Txt Der/lasj Ex/A Der/vuota N Cmp/SgGen Cmp #21->21 "girji" Ex/N Sem/Txt Der/lasj Ex/A Ex/Attr Der/vuota N Cmp/SgGen Cmp #21->21 "girjjálaš" Ex/A Der/vuota N Cmp/SgGen Cmp #21->21 "girjjálašvuohta" N Sem/Txt Cmp/SgGen Cmp #21->21 "" "gis" Pcle @PCLE MAP:22087:r16 &typo #22->22 ADD:10066:Err/Orth-any "diehtit" V TV Ind Prs Sg3 Err/Orth SUBSTITUTE:4876 #22->22 typo "gis" Pcle @PCLE MAP:22087:r16 &typo &SUGGEST #22->22 ADD:10066:Err/Orth-any COPY:10075:Err/ Orth-any "diehtit" V Ind Prs Sg3 SUBSTITUTE:4876 #22->22 diehtit+V+Ind+Prs+Sg3#gis+Pcle ? : This seems to be the old particle problem again, we should really do something about it

unhammer commented 2 years ago

This is what's going on:

$ echo 'girjjálašvuođadiehttagis' | modes/trace-smegramrelease3-cg.mode 
        "gis" Pcle <W:0.0> "<gis>" <LastCohort> <firstCohort>
                "diehtit" V <EX-Nom-Ani> TV Ind Prs Sg3 Err/Orth <W:0.0> <LastCohort> <firstCohort> SUBSTITUTE:4876
                        "girji" Ex/N Sem/Txt Der/lasj Ex/A Der/vuota N Cmp/SgGen Cmp <W:0.0> "<girjjálašvuođadiehtta>" <LastCohort> <firstCohort>
        "gis" Pcle <W:0.0> "<gis>" <LastCohort> <firstCohort>
                "diehtit" V <EX-Nom-Ani> TV Ind Prs Sg3 Err/Orth <W:0.0> <LastCohort> <firstCohort> SUBSTITUTE:4876
                        "girji" Ex/N Sem/Txt Der/lasj Ex/A Ex/Attr Der/vuota N Cmp/SgGen Cmp <W:0.0> "<girjjálašvuođadiehtta>" <LastCohort> <firstCohort>
        "gis" Pcle <W:0.0> "<gis>" <LastCohort> <firstCohort>
                "diehtit" V <EX-Nom-Ani> TV Ind Prs Sg3 Err/Orth <W:0.0> <LastCohort> <firstCohort> SUBSTITUTE:4876
                        "girjjálaš" Ex/A Der/vuota N Cmp/SgGen Cmp <W:0.0> "<girjjálašvuođadiehtta>" <LastCohort> <firstCohort>
        "gis" Pcle <W:0.0> "<gis>" <LastCohort> <firstCohort>
                "diehtit" V <EX-Nom-Ani> TV Ind Prs Sg3 Err/Orth <W:0.0> <LastCohort> <firstCohort> SUBSTITUTE:4876
                        "girjjálašvuohta" N Sem/Txt Cmp/SgGen Cmp <W:0.0> "<girjjálašvuođadiehtta>" <LastCohort> <firstCohort>

$ echo 'girjjálašvuođadiehttagis' | modes/trace-smegramrelease4-mwe-split.mode
        "girji" Ex/N Sem/Txt Der/lasj Ex/A Der/vuota N Cmp/SgGen Cmp <W:0.0> <LastCohort> <firstCohort>
        "girji" Ex/N Sem/Txt Der/lasj Ex/A Ex/Attr Der/vuota N Cmp/SgGen Cmp <W:0.0> <LastCohort> <firstCohort>
        "girjjálaš" Ex/A Der/vuota N Cmp/SgGen Cmp <W:0.0> <LastCohort> <firstCohort>
        "girjjálašvuohta" N Sem/Txt Cmp/SgGen Cmp <W:0.0> <LastCohort> <firstCohort>
        "gis" Pcle <W:0.0> <LastCohort> <firstCohort>
                "diehtit" V <EX-Nom-Ani> TV Ind Prs Sg3 Err/Orth <W:0.0> <LastCohort> <firstCohort> SUBSTITUTE:4876

$ echo 'girjjálašvuođadiehttagis' | modes/trace-smegramrelease.mode 
        "girji" Ex/N Sem/Txt Der/lasj Ex/A Der/vuota N Cmp/SgGen Cmp <W:0.0> <LastCohort> <firstCohort>
        "girji" Ex/N Sem/Txt Der/lasj Ex/A Ex/Attr Der/vuota N Cmp/SgGen Cmp <W:0.0> <LastCohort> <firstCohort>
        "girjjálaš" Ex/A Der/vuota N Cmp/SgGen Cmp <W:0.0> <LastCohort> <firstCohort>
        "girjjálašvuohta" N Sem/Txt Cmp/SgGen Cmp <W:0.0> <LastCohort> <firstCohort>
        "gis" Pcle <W:0.0> <LastCohort> <firstCohort> @PCLE MAP:22090:r16 &typo ADD:10126:Err/Orth-any
                "diehtit" V <EX-Nom-Ani> TV Ind Prs Sg3 Err/Orth <W:0.0> <LastCohort> <firstCohort> SUBSTITUTE:4876
        "gis" Pcle <W:0.0> <LastCohort> <firstCohort> @PCLE MAP:22090:r16 &typo &SUGGEST ADD:10126:Err/Orth-any COPY:10135:Err/Orth-any
                "diehtit" V <EX-Nom-Ani> Ind Prs Sg3 <W:0.0> <LastCohort> <firstCohort> SUBSTITUTE:4876
diehtit+V+Ind+Prs+Sg3#gis+Pcle  ?

Much simplified, we have the following from the analyser:

    "c" Pcle "<c>"
        "b" V Err/Orth
            "a" N "<ab>"

which cg-mwesplit turns into

    "a" N
    "c" Pcle
        "b" V Err/Orth

Now the generator gets sent


which doesn't give any results. If we could send just b+V, we would get the correct form for that part, but then we'd need an input mark between "<a>" and "<b>" so we got

    "c" Pcle "<c>"
        "b" V Err/Orth "<b>"
            "a" N "<a>"

or in the original example:

    "gis" Pcle "<gis>"
        "diehtit" V TV Ind Prs Sg3 Err/Orth "<diehtta>"
            "girjjálašvuohta" N Cmp/SgGen Cmp "<girjjálašvuođa>"

which cg-mwesplit would turn into

    "girjjálašvuohta" N Cmp/SgGen Cmp
    "diehtit" V TV Ind Prs Sg3 Err/Orth
    "gis" Pcle

At least that's one possibility – I have no idea how hard that would be to do on the lexicon side.