Open jheinecke opened 3 years ago
This works for me, one question I have is what to do when the pronoun is inside, e.g.
# sent_id = bremaik.vislcg.txt:70:2196
...
12-13 en o zouez _ _ _ _ _ _ _ _
12 en o zouez e-touez ADP pr _ 13 case _ _
13 o indirect PRON prn Case=Acc|Number=Plur|Person=3|PronType=Prs 14 nmod _ _
14 kantikoù kantik NOUN n Gender=Masc|Number=Plur 6 parataxis _ _
15 gouestlet gouestlañ VERB vblex Tense=Past|VerbForm=Part 14 acl _ _
...
Interesting case. One could split it in three
...
12-14 en o zouez _ _ _ _ _ _ _ _
12 en e ADP pr _ 14 case _ _
13 o int PRON prn Case=Acc|Number=Plur|Person=3|PronType=Prs 15 nmod _ _
14 zouez touez NOUN noun _ 15 nmod _ _
15 kantikoù kantik NOUN n Gender=Masc|Number=Plur 6 parataxis _ _
16 gouestlet gouestlañ VERB vblex Tense=Past|VerbForm=Part 15 acl _ _
...
but we would lose the ADP e-touez ("amongst") and get e touez "in a mixture". But this is the case for all MWTs ?
On our to-do list too :-)
Ar Céad 2 MFómh 2020 ag 08:08, scríobh Johannes Heinecke < notifications@github.com>:
I propose to redo the MWT of inflected prepositions and merged prepostions
- article by putting the uninflected preposition andthe standard pronoun in the form (and lemma) column (and delete the indirect lemma), cf. commit 4daa27b https://github.com/UniversalDependencies/UD_Breton-KEB/commit/4daa27b974e67991ff40d4e402e70a01fbe78d05 :
din
1-2 din
1 da da ADP
2 me me PRON
similarly for dit, dezhañ, dezhi, deomp/dimp, deoc'h, dezho: da + te, eñ, hi, ni, c'hwi, int and other prepositions like a, dre, e, e-giz, evit, gant, ouzh, war, nemet, hervez, diouzh
er
1-2 er
1 e e ADP
2 ar an DET
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/UniversalDependencies/UD_Breton-KEB/issues/3, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABWIZA6J6H2KTJAM5QAFT3TSDXVQDANCNFSM4QSVBYHA .
-- Slán agus Beannacht
Great! BTW I'll do so for Welsh too
And Scottish Gaelic!
One could split it in three ... but we would lose the ADP e-touez ("amongst") and get e touez "in a mixture". But this is the case for all MWTs ?
Since the syntactic words to which you split a MWT do not have to be substrings of the MWT, you can split it in two (which, if I understand it correctly, reflects the real syntactic words that are there – a preposition and a pronoun), make the first form e-touez and the second word would be the real form of the pronoun.
One could split it in three ... but we would lose the ADP e-touez ("amongst") and get e touez "in a mixture". But this is the case for all MWTs ?
Since the syntactic words to which you split a MWT do not have to be substrings of the MWT, you can split it in two (which, if I understand it correctly, reflects the real syntactic words that are there – a preposition and a pronoun), make the first form e-touez and the second word would be the real form of the pronoun.
I think I'm missing something here: what would the third word in this example be if 12 is e-touez?
12-14 en o zouez _ _ _ _ _ _ _ _
12 en e ADP pr _ 14 case _ _
13 o int PRON prn Case=Acc|Number=Plur|Person=3|PronType=Prs 15 nmod _ _
14 zouez touez NOUN noun _ 15 nmod _ _
The lemmas are o and e-touez. The problem is that the pronoun o is infixed into the multitoken e-touez (which is mutated/lenited to zouez). I understand @dan-zeman to split en o zouez into o and e-touez:
12-13 en o zouez _ _ _ _ _ _ _ _
12 o o PRON prn Case=Acc|Number=Plur|Person=3|PronType=Prs 14 nmod _ _
13 e-touez e-touez ADP pr _ 12 case _ _
but in any case we would have blanks (spaces) in the MWT. I do not know whether the guidelines allow this
The guidelines allow it for a closed set of expressions (they need to be defined).
I would probably put e-touez first and then the pronoun after to have the same order for ADP
and PRON
.
Yes, ADP
before PRON
is better.
One could split it in three ... but we would lose the ADP e-touez ("amongst") and get e touez "in a mixture". But this is the case for all MWTs ?
Since the syntactic words to which you split a MWT do not have to be substrings of the MWT, you can split it in two (which, if I understand it correctly, reflects the real syntactic words that are there – a preposition and a pronoun), make the first form e-touez and the second word would be the real form of the pronoun.
I think I'm missing something here: what would the third word in this example be if 12 is e-touez?
12-14 en o zouez _ _ _ _ _ _ _ _
12 en e ADP pr _ 14 case _ _
13 o int PRON prn Case=Acc|Number=Plur|Person=3|PronType=Prs 15 nmod _ _
14 zouez touez NOUN noun _ 15 nmod _ _
There would be no third word, it's basically infixing o
inside e-touez
(this is a bit of a question about how grammaticalised we think the e-touez
"among" expression is.
I propose to redo the MWT of inflected prepositions and merged prepostions + article by putting the uninflected preposition andthe standard pronoun in the form (and lemma) column (and delete the indirect lemma), cf. commit 4daa27b:
din
similarly for dit, dezhañ, dezhi, deomp/dimp, deoc'h, dezho: da + te, eñ, hi, ni, c'hwi, int and other prepositions like a, dre, e, e-giz, evit, gant, ouzh, war, nemet, hervez, diouzh
er