UniversalDependencies / UD_Breton-KEB

Other
1 stars 2 forks source link

MWTs for inflected pronouns #3

Open jheinecke opened 3 years ago

jheinecke commented 3 years ago

I propose to redo the MWT of inflected prepositions and merged prepostions + article by putting the uninflected preposition andthe standard pronoun in the form (and lemma) column (and delete the indirect lemma), cf. commit 4daa27b:

din

1-2 din _   _
1   da  da  ADP
2   me  me  PRON

similarly for dit, dezhañ, dezhi, deomp/dimp, deoc'h, dezho: da + te, eñ, hi, ni, c'hwi, int and other prepositions like a, dre, e, e-giz, evit, gant, ouzh, war, nemet, hervez, diouzh

er

1-2 er  _   _
1   e   e   ADP
2   ar  an  DET
ftyers commented 3 years ago

This works for me, one question I have is what to do when the pronoun is inside, e.g.

#  sent_id = bremaik.vislcg.txt:70:2196
...
12-13   en o zouez      _       _       _       _       _       _       _       _
12      en o zouez      e-touez ADP     pr      _       13      case    _       _
13      o       indirect        PRON    prn     Case=Acc|Number=Plur|Person=3|PronType=Prs      14      nmod    _       _
14      kantikoù        kantik  NOUN    n       Gender=Masc|Number=Plur 6       parataxis       _       _
15      gouestlet       gouestlañ       VERB    vblex   Tense=Past|VerbForm=Part        14      acl     _       _
...
jheinecke commented 3 years ago

Interesting case. One could split it in three

...
12-14   en o zouez  _   _   _   _   _   _   _   _
12  en  e   ADP pr  _   14  case    _   _
13  o   int PRON    prn Case=Acc|Number=Plur|Person=3|PronType=Prs  15  nmod    _   _
14  zouez   touez   NOUN    noun    _   15  nmod    _   _
15  kantikoù    kantik  NOUN    n   Gender=Masc|Number=Plur 6   parataxis   _   _
16  gouestlet   gouestlañ   VERB    vblex   Tense=Past|VerbForm=Part    15  acl _   _
...

but we would lose the ADP e-touez ("amongst") and get e touez "in a mixture". But this is the case for all MWTs ?

tlynn747 commented 3 years ago

On our to-do list too :-)

Ar Céad 2 MFómh 2020 ag 08:08, scríobh Johannes Heinecke < notifications@github.com>:

I propose to redo the MWT of inflected prepositions and merged prepostions

din

1-2 din

1 da da ADP

2 me me PRON

similarly for dit, dezhañ, dezhi, deomp/dimp, deoc'h, dezho: da + te, eñ, hi, ni, c'hwi, int and other prepositions like a, dre, e, e-giz, evit, gant, ouzh, war, nemet, hervez, diouzh

er

1-2 er

1 e e ADP

2 ar an DET

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/UniversalDependencies/UD_Breton-KEB/issues/3, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABWIZA6J6H2KTJAM5QAFT3TSDXVQDANCNFSM4QSVBYHA .

-- Slán agus Beannacht

jheinecke commented 3 years ago

Great! BTW I'll do so for Welsh too

colinbatchelor commented 3 years ago

And Scottish Gaelic!

dan-zeman commented 3 years ago

One could split it in three ... but we would lose the ADP e-touez ("amongst") and get e touez "in a mixture". But this is the case for all MWTs ?

Since the syntactic words to which you split a MWT do not have to be substrings of the MWT, you can split it in two (which, if I understand it correctly, reflects the real syntactic words that are there – a preposition and a pronoun), make the first form e-touez and the second word would be the real form of the pronoun.

colinbatchelor commented 3 years ago

One could split it in three ... but we would lose the ADP e-touez ("amongst") and get e touez "in a mixture". But this is the case for all MWTs ?

Since the syntactic words to which you split a MWT do not have to be substrings of the MWT, you can split it in two (which, if I understand it correctly, reflects the real syntactic words that are there – a preposition and a pronoun), make the first form e-touez and the second word would be the real form of the pronoun.

I think I'm missing something here: what would the third word in this example be if 12 is e-touez?

12-14 en o zouez _ _ _ _ _ _ _ _ 12 en e ADP pr _ 14 case _ _ 13 o int PRON prn Case=Acc|Number=Plur|Person=3|PronType=Prs 15 nmod _ _ 14 zouez touez NOUN noun _ 15 nmod _ _

jheinecke commented 3 years ago

The lemmas are o and e-touez. The problem is that the pronoun o is infixed into the multitoken e-touez (which is mutated/lenited to zouez). I understand @dan-zeman to split en o zouez into o and e-touez:

12-13   en o zouez  _   _   _   _   _   _   _   _
12  o   o   PRON    prn Case=Acc|Number=Plur|Person=3|PronType=Prs  14  nmod    _   _
13  e-touez e-touez ADP pr  _   12  case    _   _

but in any case we would have blanks (spaces) in the MWT. I do not know whether the guidelines allow this

ftyers commented 3 years ago

The guidelines allow it for a closed set of expressions (they need to be defined).

I would probably put e-touez first and then the pronoun after to have the same order for ADP and PRON.

jheinecke commented 3 years ago

Yes, ADP before PRON is better.

ftyers commented 3 years ago

One could split it in three ... but we would lose the ADP e-touez ("amongst") and get e touez "in a mixture". But this is the case for all MWTs ?

Since the syntactic words to which you split a MWT do not have to be substrings of the MWT, you can split it in two (which, if I understand it correctly, reflects the real syntactic words that are there – a preposition and a pronoun), make the first form e-touez and the second word would be the real form of the pronoun.

I think I'm missing something here: what would the third word in this example be if 12 is e-touez?

12-14 en o zouez _ _ _ _ _ _ _ _ 12 en e ADP pr _ 14 case _ _ 13 o int PRON prn Case=Acc|Number=Plur|Person=3|PronType=Prs 15 nmod _ _ 14 zouez touez NOUN noun _ 15 nmod _ _

There would be no third word, it's basically infixing o inside e-touez (this is a bit of a question about how grammaticalised we think the e-touez "among" expression is.