UniversalDependencies / UD_Irish-IDT

Irish data
Other
6 stars 7 forks source link

Form=Len missing from some Noun morph features #47

Closed tlynn747 closed 3 years ago

tlynn747 commented 3 years ago

See 1594

thábhairneoirí in POS source file is tábhairneoir+Noun+Masc+Com+Pl+Len

But missing in treebank features:

18 ar ar ADP Simp 19 case 19 thábhairneoirí tábhairneoir NOUN Noun Case=NomAcc|Gender=Fem|Number=Plur 7 obl 20 na na DET Art PronType=Art 21 det 21 tíre tír NOUN Noun Case=Gen|Definite=Def|Gender=Fem|Number=Sing 19 nmod 22 seo seo DET Det PronType=Dem 21 det SpaceAfter=No

At the same time, some are missing the lenition marker in the source files so it's clear where the confusion arises with predictions:

sent 1682 source: Sa i+Prep+Art+Sg bhliain bliain+Noun+Fem+Com+Sg+DefArt

17 tharla tarlaigh VERB VTI Form=Len|Mood=Ind|Tense=Past 1 conj 18 an an DET Art Definite=Def|Number=Sing|PronType=Art 19 det 19 tubaiste tubaiste NOUN Noun Gender=Fem|Number=Sing 17 nsubj 20 sin sin DET Det PronType=Dem 19 det 21 sa i ADP Art Number=Sing|PronType=Art 22 case 22 bhliain bliain NOUN Noun Definite=Def|Gender=Fem|Number=Sing 17 obl:tmod 23 1909 1909 NUM Num 22 nmod SpaceAfter=No

kscanne commented 3 years ago

I should be able to handle these fixes automatically, depending on your answer to my question in #35