clarin-eric / ParlaMint

ParlaMint: Comparable Parliamentary Corpora
https://clarin-eric.github.io/ParlaMint/
41 stars 52 forks source link

PL – help needed on syntactic annotation error #611

Closed mrudolf closed 1 year ago

mrudolf commented 1 year ago

In task #573 there is a section Huge amount of L2 syntax errors.

Can you give me an example of a bug and how it should be corrected? The analyser cannot be upgraded, so I will have to do some postprocessing.

TomazErjavec commented 1 year ago

I'm not sure what @matyaskopp take is on this, but I now correct these errors in the finalization step of the corpus processing, so, as far as I am concerned, you do not need to fix them. But, for the curious, in UD only the token that is the root of the syntax tree can have the "root" dependency. So, all other tokens that have this dependency label should have it substituted by "dep". This is the fix: https://github.com/clarin-eric/ParlaMint/blob/031ec3009386a4bfec60bf0e22f653a813ddf98c/Scripts/parlamint2final.xsl#L357-L377

mrudolf commented 1 year ago

Thanks.

If I can fix this easily, I can do it. I am not sure I understand the fix though, is there any example of the code before and after the correction I can compare?

TomazErjavec commented 1 year ago

If I can fix this easily, I can do it.

It is a simple fix.

I am not sure I understand the fix though, is there any example of the code before and after the correction I can compare?

Error:

<linkGrp targFunc="head argument" type="UD-SYN">
   <link ana="ud-syn:root"
         target="#ParlaMint-XX_2016-04-13.u1.p1.s1 #ParlaMint-XX_2016-04-13.u1.p1.s1.w1"/>
   <link ana="ud-syn:root"
         target="#ParlaMint-XX_2016-04-13.u1.p1.s1.w1 #ParlaMint-XX_2016-04-13.u1.p1.s1.w2"/>
</linkGrp>

Corrected:

<linkGrp targFunc="head argument" type="UD-SYN">
   <link ana="ud-syn:root"
         target="#ParlaMint-XX_2016-04-13.u1.p1.s1 #ParlaMint-XX_2016-04-13.u1.p1.s1.w1"/>
   <link ana="ud-syn:dep"
         target="#ParlaMint-XX_2016-04-13.u1.p1.s1.w1 #ParlaMint-XX_2016-04-13.u1.p1.s1.w2"/>
</linkGrp>
matyaskopp commented 1 year ago

this is an error message:

[Line 292 Sent seg21.1]: [L2 Syntax 0-is-not-root] DEPREL must be 'root' if HEAD is 0.

wrong syntax:

            <s xml:id="seg21.1">
              <w lemma="pan" msd="UPosTag=NOUN|Animacy=Hum|Case=Voc|Gender=Masc|Number=Sing" xml:id="seg21.1.1">Panie</w>
              <w join="right" lemma="marszałek" msd="UPosTag=PROPN|Animacy=Hum|Case=Voc|Gender=Masc|Number=Sing" xml:id="seg21.1.2">Marszałku</w>
              <pc msd="UPosTag=PUNCT|PunctType=Excl" xml:id="seg21.1.3">!</pc>
              <linkGrp targFunc="head argument" type="UD-SYN">
                <link ana="ud-syn:vocative" target="#seg21.1 #seg21.1.1"/>
                <link ana="ud-syn:appos" target="#seg21.1.1 #seg21.1.2"/>
                <link ana="ud-syn:punct" target="#seg21.1.1 #seg21.1.3"/>
              </linkGrp>
            </s>

corrected syntax:

            <s xml:id="seg21.1">
              <w lemma="pan" msd="UPosTag=NOUN|Animacy=Hum|Case=Voc|Gender=Masc|Number=Sing" xml:id="seg21.1.1">Panie</w>
              <w join="right" lemma="marszałek" msd="UPosTag=PROPN|Animacy=Hum|Case=Voc|Gender=Masc|Number=Sing" xml:id="seg21.1.2">Marszałku</w>
              <pc msd="UPosTag=PUNCT|PunctType=Excl" xml:id="seg21.1.3">!</pc>
              <linkGrp targFunc="head argument" type="UD-SYN">
                <link ana="ud-syn:root" target="#seg21.1 #seg21.1.1"/> <!-- CHANGING RELATION -->
                <link ana="ud-syn:appos" target="#seg21.1.1 #seg21.1.2"/>
                <link ana="ud-syn:punct" target="#seg21.1.1 #seg21.1.3"/>
              </linkGrp>
            </s>

when the head is sentence id, then ana should be ud-syn:root, so it is simple to fix even with sed

matyaskopp commented 1 year ago

@TomazErjavec gave you a different example "root inside the tree":

[L2 Syntax root-is-not-0] DEPREL cannot be 'root' if HEAD is not 0.

my sample show "root node does not have root relation"

[L2 Syntax 0-is-not-root] DEPREL must be 'root' if HEAD is 0.
TomazErjavec commented 1 year ago

A good thing we both gave examples then :)

mrudolf commented 1 year ago

Did both appear in my data?

matyaskopp commented 1 year ago

Did both appear in my data?

YES even in single sentence

            <s xml:id="seg160.1">
              <pc join="right" msd="UPosTag=PUNCT|PunctType=Peri" xml:id="seg160.1.1">.</pc>
              <pc join="right" msd="UPosTag=PUNCT|PunctType=Peri" xml:id="seg160.1.2">.</pc>
              <pc join="right" msd="UPosTag=PUNCT|PunctType=Peri" xml:id="seg160.1.3">.</pc>
              <w lemma="o" msd="UPosTag=ADP|AdpType=Prep" xml:id="seg160.1.4">o</w>
              <w join="right" lemma="artykuł" msd="UPosTag=X|Abbr=Yes|Pun=Yes" xml:id="seg160.1.5">art</w>
              <pc msd="UPosTag=PUNCT|PunctType=Peri" xml:id="seg160.1.6">.</pc>
              <w join="right" lemma="30" msd="UPosTag=ADJ|Case=Loc|Degree=Pos|Gender=Masc|NumForm=Digit|NumType=Ord|Number=Sing" xml:id="seg160.1.7">30</w>
              <pc join="right" msd="UPosTag=PUNCT|PunctType=Peri" xml:id="seg160.1.8">.</pc>
              <pc join="right" msd="UPosTag=PUNCT|PunctType=Peri" xml:id="seg160.1.9">.</pc>
              <pc msd="UPosTag=PUNCT|PunctType=Peri" xml:id="seg160.1.10">.</pc>
              <linkGrp targFunc="head argument" type="UD-SYN">
                <link ana="ud-syn:punct" target="#seg160.1 #seg160.1.1"/> <!-- should be root -->
                <link ana="ud-syn:punct" target="#seg160.1.1 #seg160.1.2"/>
                <link ana="ud-syn:punct" target="#seg160.1.2 #seg160.1.3"/>
                <link ana="ud-syn:case" target="#seg160.1.5 #seg160.1.4"/>
                <link ana="ud-syn:root" target="#seg160.1.1 #seg160.1.5"/><!-- should be dep -->
                <link ana="ud-syn:punct" target="#seg160.1.5 #seg160.1.6"/>
                <link ana="ud-syn:nmod" target="#seg160.1.5 #seg160.1.7"/>
                <link ana="ud-syn:punct" target="#seg160.1.5 #seg160.1.8"/>
                <link ana="ud-syn:punct" target="#seg160.1.8 #seg160.1.9"/>
                <link ana="ud-syn:punct" target="#seg160.1.9 #seg160.1.10"/>
              </linkGrp>
            </s>
mrudolf commented 1 year ago

Thanks.

Reiterating to ensure I got everything right.

  1. The element linking to <s> is always root.
  2. No other element is root, all other root should become dep.

Is that correct?

TomazErjavec commented 1 year ago

It is.