Closed mrudolf closed 1 year ago
I'm not sure what @matyaskopp take is on this, but I now correct these errors in the finalization step of the corpus processing, so, as far as I am concerned, you do not need to fix them. But, for the curious, in UD only the token that is the root of the syntax tree can have the "root" dependency. So, all other tokens that have this dependency label should have it substituted by "dep". This is the fix: https://github.com/clarin-eric/ParlaMint/blob/031ec3009386a4bfec60bf0e22f653a813ddf98c/Scripts/parlamint2final.xsl#L357-L377
Thanks.
If I can fix this easily, I can do it. I am not sure I understand the fix though, is there any example of the code before and after the correction I can compare?
If I can fix this easily, I can do it.
It is a simple fix.
I am not sure I understand the fix though, is there any example of the code before and after the correction I can compare?
Error:
<linkGrp targFunc="head argument" type="UD-SYN">
<link ana="ud-syn:root"
target="#ParlaMint-XX_2016-04-13.u1.p1.s1 #ParlaMint-XX_2016-04-13.u1.p1.s1.w1"/>
<link ana="ud-syn:root"
target="#ParlaMint-XX_2016-04-13.u1.p1.s1.w1 #ParlaMint-XX_2016-04-13.u1.p1.s1.w2"/>
</linkGrp>
Corrected:
<linkGrp targFunc="head argument" type="UD-SYN">
<link ana="ud-syn:root"
target="#ParlaMint-XX_2016-04-13.u1.p1.s1 #ParlaMint-XX_2016-04-13.u1.p1.s1.w1"/>
<link ana="ud-syn:dep"
target="#ParlaMint-XX_2016-04-13.u1.p1.s1.w1 #ParlaMint-XX_2016-04-13.u1.p1.s1.w2"/>
</linkGrp>
this is an error message:
[Line 292 Sent seg21.1]: [L2 Syntax 0-is-not-root] DEPREL must be 'root' if HEAD is 0.
wrong syntax:
<s xml:id="seg21.1">
<w lemma="pan" msd="UPosTag=NOUN|Animacy=Hum|Case=Voc|Gender=Masc|Number=Sing" xml:id="seg21.1.1">Panie</w>
<w join="right" lemma="marszałek" msd="UPosTag=PROPN|Animacy=Hum|Case=Voc|Gender=Masc|Number=Sing" xml:id="seg21.1.2">Marszałku</w>
<pc msd="UPosTag=PUNCT|PunctType=Excl" xml:id="seg21.1.3">!</pc>
<linkGrp targFunc="head argument" type="UD-SYN">
<link ana="ud-syn:vocative" target="#seg21.1 #seg21.1.1"/>
<link ana="ud-syn:appos" target="#seg21.1.1 #seg21.1.2"/>
<link ana="ud-syn:punct" target="#seg21.1.1 #seg21.1.3"/>
</linkGrp>
</s>
corrected syntax:
<s xml:id="seg21.1">
<w lemma="pan" msd="UPosTag=NOUN|Animacy=Hum|Case=Voc|Gender=Masc|Number=Sing" xml:id="seg21.1.1">Panie</w>
<w join="right" lemma="marszałek" msd="UPosTag=PROPN|Animacy=Hum|Case=Voc|Gender=Masc|Number=Sing" xml:id="seg21.1.2">Marszałku</w>
<pc msd="UPosTag=PUNCT|PunctType=Excl" xml:id="seg21.1.3">!</pc>
<linkGrp targFunc="head argument" type="UD-SYN">
<link ana="ud-syn:root" target="#seg21.1 #seg21.1.1"/> <!-- CHANGING RELATION -->
<link ana="ud-syn:appos" target="#seg21.1.1 #seg21.1.2"/>
<link ana="ud-syn:punct" target="#seg21.1.1 #seg21.1.3"/>
</linkGrp>
</s>
when the head
is sentence id, then ana should be ud-syn:root
, so it is simple to fix even with sed
@TomazErjavec gave you a different example "root inside the tree":
[L2 Syntax root-is-not-0] DEPREL cannot be 'root' if HEAD is not 0.
my sample show "root node does not have root relation"
[L2 Syntax 0-is-not-root] DEPREL must be 'root' if HEAD is 0.
A good thing we both gave examples then :)
Did both appear in my data?
Did both appear in my data?
YES even in single sentence
<s xml:id="seg160.1">
<pc join="right" msd="UPosTag=PUNCT|PunctType=Peri" xml:id="seg160.1.1">.</pc>
<pc join="right" msd="UPosTag=PUNCT|PunctType=Peri" xml:id="seg160.1.2">.</pc>
<pc join="right" msd="UPosTag=PUNCT|PunctType=Peri" xml:id="seg160.1.3">.</pc>
<w lemma="o" msd="UPosTag=ADP|AdpType=Prep" xml:id="seg160.1.4">o</w>
<w join="right" lemma="artykuł" msd="UPosTag=X|Abbr=Yes|Pun=Yes" xml:id="seg160.1.5">art</w>
<pc msd="UPosTag=PUNCT|PunctType=Peri" xml:id="seg160.1.6">.</pc>
<w join="right" lemma="30" msd="UPosTag=ADJ|Case=Loc|Degree=Pos|Gender=Masc|NumForm=Digit|NumType=Ord|Number=Sing" xml:id="seg160.1.7">30</w>
<pc join="right" msd="UPosTag=PUNCT|PunctType=Peri" xml:id="seg160.1.8">.</pc>
<pc join="right" msd="UPosTag=PUNCT|PunctType=Peri" xml:id="seg160.1.9">.</pc>
<pc msd="UPosTag=PUNCT|PunctType=Peri" xml:id="seg160.1.10">.</pc>
<linkGrp targFunc="head argument" type="UD-SYN">
<link ana="ud-syn:punct" target="#seg160.1 #seg160.1.1"/> <!-- should be root -->
<link ana="ud-syn:punct" target="#seg160.1.1 #seg160.1.2"/>
<link ana="ud-syn:punct" target="#seg160.1.2 #seg160.1.3"/>
<link ana="ud-syn:case" target="#seg160.1.5 #seg160.1.4"/>
<link ana="ud-syn:root" target="#seg160.1.1 #seg160.1.5"/><!-- should be dep -->
<link ana="ud-syn:punct" target="#seg160.1.5 #seg160.1.6"/>
<link ana="ud-syn:nmod" target="#seg160.1.5 #seg160.1.7"/>
<link ana="ud-syn:punct" target="#seg160.1.5 #seg160.1.8"/>
<link ana="ud-syn:punct" target="#seg160.1.8 #seg160.1.9"/>
<link ana="ud-syn:punct" target="#seg160.1.9 #seg160.1.10"/>
</linkGrp>
</s>
Thanks.
Reiterating to ensure I got everything right.
<s>
is always root.Is that correct?
It is.
In task #573 there is a section Huge amount of L2 syntax errors.
Can you give me an example of a bug and how it should be corrected? The analyser cannot be upgraded, so I will have to do some postprocessing.