clarin-eric / ParlaMint

ParlaMint: Comparable Parliamentary Corpora
https://clarin-eric.github.io/ParlaMint/
41 stars 52 forks source link

Missplaced transcriber comments inside sentence #694

Closed matyaskopp closed 1 year ago

matyaskopp commented 1 year ago

This issue is to record the current status of placing notes inside sentences of the translated text.

Source:

<s xml:id="ParlaMint-CZ_2014-01-22-ps2013-005-02-003-005.u63.p1.s8">
<w xml:id="ParlaMint-CZ_2014-01-22-ps2013-005-02-003-005.u63.p1.s8.w1" lemma="a" msd="UPosTag=CCONJ">A</w>
<w xml:id="ParlaMint-CZ_2014-01-22-ps2013-005-02-003-005.u63.p1.s8.w2" lemma="tvůj" msd="UPosTag=DET|Case=Nom|Gender=Masc|Number=Sing|Number[psor]=Plur|Person=2|Poss=Yes|PronType=Prs">váš</w>
<note type="comment">pozn. sten.: myšleno nejspíše náš</note>
<w xml:id="ParlaMint-CZ_2014-01-22-ps2013-005-02-003-005.u63.p1.s8.w3" lemma="hejtman" msd="UPosTag=NOUN|Animacy=Anim|Case=Nom|Gender=Masc|Number=Sing|Polarity=Pos">hejtman</w>
<w xml:id="ParlaMint-CZ_2014-01-22-ps2013-005-02-003-005.u63.p1.s8.w4" lemma="dodržet" msd="UPosTag=VERB|Gender=Masc|Number=Sing|Polarity=Pos|Tense=Past|VerbForm=Part|Voice=Act">dodržel</w>
<w xml:id="ParlaMint-CZ_2014-01-22-ps2013-005-02-003-005.u63.p1.s8.w5" lemma="zákonnost" msd="UPosTag=NOUN|Case=Acc|Gender=Fem|Number=Sing|Polarity=Pos">zákonnost</w>
<w xml:id="ParlaMint-CZ_2014-01-22-ps2013-005-02-003-005.u63.p1.s8.w6" lemma="a" msd="UPosTag=CCONJ">a</w>
<w xml:id="ParlaMint-CZ_2014-01-22-ps2013-005-02-003-005.u63.p1.s8.w7" lemma="vyhodit" msd="UPosTag=VERB|Aspect=Perf|Gender=Masc|Number=Sing|Polarity=Pos|Tense=Past|VerbForm=Part|Voice=Act">vyhodil</w>
<w xml:id="ParlaMint-CZ_2014-01-22-ps2013-005-02-003-005.u63.p1.s8.w8" lemma="on" msd="UPosTag=PRON|Case=Acc|Gender=Masc,Neut|Number=Sing|Person=3|PronType=Prs|Variant=Short" join="right">ho</w>
<pc xml:id="ParlaMint-CZ_2014-01-22-ps2013-005-02-003-005.u63.p1.s8.w9" msd="UPosTag=PUNCT">.</pc>
<linkGrp targFunc="head argument" type="UD-SYN"><!-- ... --></linkGrp>
</s>

MT version (note is placed after sentence):

<s xml:id="ParlaMint-CZ_2014-01-22-ps2013-005-02-003-005.u63.p1.s8" n="842" corresp="mt-src:ParlaMint-CZ_2014-01-22-ps2013-005-02-003-005.u63.p1.s8">
<w pos="CC" msd="UPosTag=CCONJ" lemma="and">And</w>
<w pos="PRP$" msd="UPosTag=PRON|Person=2|Poss=Yes|PronType=Prs" lemma="you">your</w>
<w pos="NN" msd="UPosTag=NOUN|Number=Sing" lemma="captain">captain</w>
<w pos="VBD" msd="UPosTag=VERB|Mood=Ind|Number=Sing|Person=3|Tense=Past|VerbForm=Fin" lemma="respect">respected</w>
<w pos="DT" msd="UPosTag=DET|Definite=Def|PronType=Art" lemma="the">the</w>
<w pos="NN" msd="UPosTag=NOUN|Number=Sing" lemma="legality">legality</w>
<w pos="CC" msd="UPosTag=CCONJ" lemma="and">and</w>
<w pos="VBD" msd="UPosTag=VERB|Mood=Ind|Number=Sing|Person=3|Tense=Past|VerbForm=Fin" lemma="fire">fired</w>
<w pos="PRP" msd="UPosTag=PRON|Case=Acc|Gender=Masc|Number=Sing|Person=3|PronType=Prs" lemma="he" join="right">him</w>
<pc pos="Z" msd="UPosTag=PUNCT">.</pc>
</s>
<note type="comment" xml:lang="en">(KNOCKING ON DOOR) Sten.: meant probably ours)</note>

Word alignment can be used for better note placement (place after the aligned word that immediately precedes the note)

# sent_id = ParlaMint-CZ_2014-01-22-ps2013-005-02-003-005.u63.p1.s8
# source = A váš hejtman dodržel zákonnost a vyhodil ho.
# text = And your captain respected the legality and fired him.
1   And and CCONJ   CC  _   0   _   _   ForwardAlignment=1|BackwardAlignment=1|NER=O
2   your    you PRON    PRP$    Person=2|Poss=Yes|PronType=Prs  1   _   _   ForwardAlignment=2|BackwardAlignment=2|NER=O
3   captain captain NOUN    NN  Number=Sing 2   _   _   ForwardAlignment=3|BackwardAlignment=3|NER=O
4   respected   respect VERB    VBD Mood=Ind|Number=Sing|Person=3|Tense=Past|VerbForm=Fin   3   _   _   ForwardAlignment=4|BackwardAlignment=4|NER=O
5   the the DET DT  Definite=Def|PronType=Art   4   _   _   NER=O
6   legality    legality    NOUN    NN  Number=Sing 5   _   _   ForwardAlignment=5|BackwardAlignment=5|NER=O
7   and and CCONJ   CC  _   6   _   _   ForwardAlignment=6|BackwardAlignment=6|NER=O
8   fired   fire    VERB    VBD Mood=Ind|Number=Sing|Person=3|Tense=Past|VerbForm=Fin   7   _   _   ForwardAlignment=7|BackwardAlignment=7|NER=O
9   him he  PRON    PRP Case=Acc|Gender=Masc|Number=Sing|Person=3|PronType=Prs  8   _   _   ForwardAlignment=8|BackwardAlignment=8|NER=O|SpaceAfter=No
10  .   .   PUNCT   .   _   9   _   _   ForwardAlignment=9|BackwardAlignment=9|NER=O
TomazErjavec commented 1 year ago

@matyaskopp this was a concious decission because I think that the placement of comments inside a sentence as opposed to after would bring minimal benefits but would hugely complicate the processing. So, I wouldn't fix this. If you want to do it, ok (although, as I say, I don't think it brings much usefulness), otherwise pls. close.