giellalt / lang-sme

Finite state and Constraint Grammar based analysers and proofing tools, and language resources for the Northern Sami language
https://giellalt.uit.no
GNU General Public License v3.0
6 stars 1 forks source link

splitting of the pronoun "feara makkárge" #38

Open lynnda-hill opened 2 years ago

lynnda-hill commented 2 years ago

The attributive multi-word pronoun "feara makkárge" is automatically tokenized with a separate particle "ge" before any Constraint Grammar analysis. For grammar rules it is a disadvantage to have a particle between an attributive pronoun and its nominal head. It is also strange when two separate elements are analyzed as a MWE just to split the element in two.

Example sentence:

Olbmuid dieđut eará máilmmeosiin bođii mátkegirjjálašvuođas, man čállin ledje feara makkárge johttit dego mišunárat, jesuihtat, sisafárrejeaddjit, šlávagávppašeaddjit ja soalddáhat.

Analysis:

"<feara makkár>"
"feara makkár" Pron Indef Attr <W:0.0> @OBJ> SELECT:14209:r1569 MAP:23953:r484 #11->11 SUBSTITUTE:10139
; "feara makkár" MWE Pron Indef Sg Nom <W:0.0> SELECT:14209:r1569
"<ge>"
"ge" Pcle <W:0.0> @PCLE MAP:22078:r16 #12->12
:
"<johttit>"
"johtit" Ex/V IV Der/NomAg N Pl Nom <W:0.0> @<SPRED MAP:23586:r3349 &real-ImprtPl2-Inf #13->13 ADD:6065:real-ImprtPl2-Inf
real-ImprtPl2-Inf
"johtit" <W:0.0> @<SPRED MAP:23586:r3349 V IV Inf &SUGGEST #13->13 ADD:6065:real-ImprtPl2-Inf COPY:6211:real-ImprtPl2-Inf
johtit+V+IV+Inf johtit
"johtti" N NomAg Sem/Hum Pl Nom <W:0.0> @<SPRED MAP:23586:r3349 #13->13
; "johtit" V <ala-V> <eret> <rasta> <birra> <IN-Com-Veh> <XT-Acc-Measure> <SO-luhtte-Ani> <DE-Ill-Plc> <DE-sisa-Build> <DE-lusa-Ani> <PT-Gen-Plc><DE-Ill-Any> <PT-Gen-Plc> <PT-rastá-Plc> <PT-meaddel-Plc> <PT-čađa-Plc> <PT-bokte-Plc> <SO-Loc-Ani><DE-Ill-Ani> <SO-Loc-*Ani> <CO-mielde-Ani> <LO-luhtte-Any> <LO-Loc-Plc> IV Imprt Pl2 <W:0.0> SUBSTITUTE:3141 SUBSTITUTE:3174 SUBSTITUTE:3725 SUBSTITUTE:3806 SUBSTITUTE:3810 SUBSTITUTE:3879 SUBSTITUTE:3881 SUBSTITUTE:3886 SUBSTITUTE:3891 SUBSTITUTE:3893 SUBSTITUTE:3895 SUBSTITUTE:3978 SUBSTITUTE:3980 SUBSTITUTE:3987 SUBSTITUTE:3989 SUBSTITUTE:4017 SUBSTITUTE:4098 SUBSTITUTE:4673 SUBSTITUTE:4688 SUBSTITUTE:4714 @+FMAINV SUBSTITUTE:9160 MAP:16650:r406 REMOVE:6102:r948
snomos commented 2 years ago

So what do you want the output / analysis to be?

lynnda-hill commented 2 years ago

I would like to get:

"<feara makkárge>"

instead of:

"<feara makkár>"
snomos commented 10 months ago

After the latest changes in the clitics.lexc file, the analysis is now:

"<feara makkár>"
    "feara makkár" Pron Indef Attr <W:0.0> @OBJ> #11->11
"<ge>"
    "ge" Pcle Foc/Neg-ge <W:0.0> @PCLE #12->12
    "ge" Pcle Foc/Pos-ge <W:0.0> @PCLE #12->12

To avoid this split, we need to lexicalise the pronoun including the clitic. That will give us two analyses that needs to be disambiguated in the mwe-dis.cg3 file. If that is still not good, we need to remove feara makkár from the clitics altogether.

Any preferences, @lynnda-hill and @duomdaamaendra ?

snomos commented 10 months ago

According to the lexc code, feara makkár can take any clitic. Is this true? What is the status of ge in feara makkárge?