elliottlash12 / UD_Old_Irish-CritMinorGlossesMilan

A Universal Dependencies Treebank for the Old Irish "Minor Glosses"
0 stars 0 forks source link

morph:word correspondence #41

Open rubywku opened 2 years ago

rubywku commented 2 years ago

Two cases associated with morph:word correspondence have been observed so far:

1. Wrongly split word If the word is erroneously segmented into two, append |Typo=Yes to the Features column and|CorrectForm= to the last column of the first word part. Then, insert a goeswith dependent (with empty lemma, UPOS of X, and empty FEATS) to connect the split word parts. The line numbers that follow should be changed. Example:

# sent_id = S0006-2962
# text = .i. dei dodilgud apecthe n do·
# text_en = i.e. of God, for forgiveness to him of his sins.
1   .i. .i. CCONJ   abbreviation    _   2   cc  X   Gloss=id est, it is
2   dei deus    PROPN   proper_noun _   0   root    X   Gloss=god, deity, God
3-4 dodilgud    _   _   _   _   _   _   _   _
3   do  do 1    ADP preposition Case=Dat    4   mark:prt    X   Gloss=to, for, by etc.
4   dilgud  dílgud  NOUN    verbal_noun Case=Dat|Number=Sing|VerbForm=VNoun 2   xcomp   X   Gloss=the act of forgiving; forgiveness, pardon, remission, forgiving
5-6 apecthe _   _   _   _   _   _   _   _
5   a   3sg.masc./neut.poss.pron.   DET pronoun_possessive  Poss=Yes|PronType=Prs|Person=3  6   nmod:poss   X   Gloss=his, its
6   pecthe  peccad  NOUN    noun    Case=Gen|Number=Plur    4   obj X   Gloss=sin
7   n   do 1    ADP preposition Person=3|Number=Sing|Gender=Masc|Case=Dat|Typo=Yes  4   obl:prep    X   Gloss=to, for, by etc.|CorrectForm=ndo
8   do  _   X   _   _   7   goeswith    X   _
9   ·   ·   PUNCT   punctuation _   _   _   _

2. Wrongly merged words In this case, append |SpaceAfter=No|CorrectSpaceAfter=Yes to the last column of the current word to indicate that a space is missing by error. Then, append |Typo=Yes to the Features column and|CorrectForm= to the last column of the next word to indicate what the correct spelling should be for this position. Insert a concatenated row if necessary. Example:

# sent_id = S0006-2560
# text = .i. innafiugrae fris inrúin
# text_en = i.e. of the figure to the mystic sense.
1   .i. .i. CCONJ   abbreviation    _   _   cc  X   Gloss=id est, it is
2-3 innafiugrae _   _   _   _   _   _   _   _
2   inna    in 1    DET definite_article    Case=Gen|Number=Sing|Gender=Fem|Definite=Def|PronType=Art   3   det X   Gloss=the
3   fiugrae figor   NOUN    noun    Case=Gen|Number=Sing    _   _   X   Gloss=figure, type, symbol
4   fri fri ADP preposition Case=Acc    6   case    X   Gloss=towards, against, standard of comparison|SpaceAfter=No|CorrectSpaceAfter=Yes
5-6 inrúin  _   _   _   _   _   _   _   _
5   in  in 1    DET definite_article    Case=Acc|Number=Sing|Gender=Fem|Definite=Def|PronType=Art|Typo=Yes  6   det X   Gloss=the|CorrectForm=sin
6   rúin    rún NOUN    noun    Case=Acc|Number=Sing    _   _   X   Gloss=mystery, secret

Ref: https://universaldependencies.org/u/overview/typos.html