UniversalDependencies / UD_Portuguese-Bosque

This Universal Dependencies (UD) Portuguese treebank.
Other
50 stars 12 forks source link

release 2.5 - serious errors #273

Closed arademaker closed 3 years ago

arademaker commented 5 years ago

These are the 73 erros that need to be fixed before 2.5 release:

[Line 63 Sent CF144-3 Node 15]: [L3 Syntax leaf-fixed] 'fixed' not expected to have children (15:ponto:fixed --> 14:o:det)
[Line 15 Sent CF161-1 Node 10]: [L3 Syntax rel-upos-det] 'det' should be 'DET' or 'PRON' but it is 'PROPN'
[Line 88 Sent CF344-3]: [L2 Morpho repeated-feature] Repeated features are disallowed: 'Gender=Masc|Number=Sing|PronType=Art|PronType=Dem'.
[Line 124 Sent CF391-4 Node 17]: [L3 Syntax orphan-parent] The parent of 'orphan' should normally be 'conj' but it is 'xcomp'.
[Line 166 Sent CF391-4 Node 55]: [L3 Syntax orphan-parent] The parent of 'orphan' should normally be 'conj' but it is 'orphan'.
[Line 71 Sent CF433-3 Node 29]: [L3 Syntax right-to-left-flat] Relation 'flat:name' must go left-to-right.
[Line 60 Sent CF525-8 Node 4]: [L3 Syntax rel-upos-det] 'det' should be 'DET' or 'PRON' but it is 'PROPN'
[Line 117 Sent CF565-2 Node 15]: [L3 Syntax right-to-left-fixed] Relation 'fixed' must go left-to-right.
[Line 114 Sent CF589-3 Node 26]: [L3 Syntax rel-upos-advmod] 'advmod' should be 'ADV' but it is 'PRON'
[Line 47 Sent CF720-2 Node 24]: [L3 Syntax leaf-mark-case] 'mark' not expected to have children (24:depois:mark --> 23:anos:nmod)
[Line 42 Sent CF735-4 Node 8]: [L3 Syntax orphan-parent] The parent of 'orphan' should normally be 'conj' but it is 'obj'.
[Line 8 Sent CF865-1 Node 1]: [L3 Syntax leaf-cc] 'cc' not expected to have children (1:Além:cc --> 3:isso:obl)
[Line 48 Sent CF969-2 Node 17]: [L3 Syntax leaf-fixed] 'fixed' not expected to have children (17:ser:fixed --> 24:final:obl)
[Line 246 Sent CP2-5 Node 22]: [L3 Syntax rel-upos-advmod] 'advmod' should be 'ADV' but it is 'NOUN'
[Line 10 Sent CP11-2 Node 6]: [L3 Syntax leaf-aux-cop] 'cop' not expected to have children (6:foi:cop --> 5:que:nsubj)
[Line 54 Sent CP12-2 Node 10]: [L5 Syntax cop-lemma] 'ir' is not a copula in language [pt]
[Line 182 Sent CP18-6 Node 25]: [L3 Syntax right-to-left-conj] Relation 'conj' must go left-to-right.
[Line 155 Sent CP44-4 Node 29]: [L3 Syntax leaf-fixed] 'fixed' not expected to have children (29:ponto:fixed --> 28:o:det)
[Line 158 Sent CP77-5 Node 15]: [L3 Syntax rel-upos-aux] 'aux' should be 'AUX' but it is 'VERB'
[Line 149 Sent CP77-5 Node 15]: [L3 Syntax leaf-aux-cop] 'aux' not expected to have children (15:terá:aux --> 6:aumento:nsubj)
[Line 19 Sent CP83-1 Node 14]: [L3 Syntax rel-upos-det] 'det' should be 'DET' or 'PRON' but it is 'PROPN'
[Line 149 Sent CP123-7 Node 16]: [L3 Syntax leaf-aux-cop] 'cop' not expected to have children (16:foi:cop --> 2:caso:obl)
[Line 78 Sent CP146-2 Node 5]: [L3 Syntax leaf-aux-cop] 'cop' not expected to have children (5:foi:cop --> 4:que:nsubj)
[Line 42 Sent CP165-2 Node 10]: [L3 Syntax leaf-mark-case] 'mark' not expected to have children (10:ali:mark --> 3:que:mark)
[Line 45 Sent CP165-2 Node 10]: [L3 Syntax leaf-mark-case] 'mark' not expected to have children (10:ali:mark --> 6:massa:nsubj)
[Line 48 Sent CP165-2 Node 10]: [L3 Syntax leaf-mark-case] 'mark' not expected to have children (10:ali:mark --> 9:está:cop)
[Line 155 Sent CP176-7 Node 7]: [L3 Syntax leaf-aux-cop] 'cop' not expected to have children (7:foi:cop --> 3:adaptação:nsubj)
[Line 141 Sent CP201-3 Node 41]: [L3 Syntax leaf-fixed] 'fixed' not expected to have children (41:relação:fixed --> 42:a:fixed)
[Line 189 Sent CP237-5 Node 9]: [L3 Syntax leaf-fixed] 'fixed' not expected to have children (9:que:fixed --> 10:é:cop)
[Line 191 Sent CP237-5 Node 9]: [L3 Syntax leaf-fixed] 'fixed' not expected to have children (9:que:fixed --> 12:Europa:nsubj)
[Line 189 Sent CP263-8 Node 30]: [L3 Syntax leaf-fixed] 'fixed' not expected to have children (30:de:fixed --> 31:o:fixed)
[Line 190 Sent CP263-8 Node 30]: [L3 Syntax leaf-fixed] 'fixed' not expected to have children (30:de:fixed --> 32:que:fixed)
[Line 212 Sent CP282-4 Node 56]: [L3 Syntax leaf-mark-case] 'mark' not expected to have children (56:apesar:mark --> 70:proporem:advcl)
[Line 139 Sent CP335-5 Node 17]: [L3 Syntax leaf-mark-case] 'mark' not expected to have children (17:tão:mark --> 20:ver:advcl)
[Line 7 Sent CP349-1 Node 2]: [L3 Syntax leaf-fixed] 'fixed' not expected to have children (2:é:fixed --> 3:que:fixed)
[Line 152 Sent CP371-3 Node 54]: [L3 Syntax leaf-aux-cop] 'aux' not expected to have children (54:tendo:aux --> 59:Câmara:nsubj)
[Line 87 Sent CP372-3 Node 27]: [L3 Syntax right-to-left-flat] Relation 'flat:name' must go left-to-right.
[Line 19 Sent CP398-2 Node 4]: [L3 Syntax rel-upos-det] 'det' should be 'DET' or 'PRON' but it is 'PROPN'
[Line 369 Sent CP408-11 Node 9]: [L3 Syntax leaf-fixed] 'fixed' not expected to have children (9:maneira:fixed --> 16:têm:advcl)
[Line 406 Sent CP408-12]: [L2 Metadata missing-spaceafter] 'SpaceAfter=No' is missing in the MISC field of node #16 because the text is 'deixam, por isso, de[...]'.
[Line 212 Sent CP434-4 Node 73]: [L3 Syntax right-to-left-conj] Relation 'conj' must go left-to-right.
[Line 111 Sent CP539-3 Node 27]: [L3 Syntax leaf-fixed] 'fixed' not expected to have children (27:ponto:fixed --> 26:o:det)
[Line 418 Sent CP566-9 Node 6]: [L3 Syntax leaf-fixed] 'fixed' not expected to have children (6:de:fixed --> 7:o:fixed)
[Line 419 Sent CP566-9 Node 6]: [L3 Syntax leaf-fixed] 'fixed' not expected to have children (6:de:fixed --> 8:que:fixed)
[Line 208 Sent CP584-7 Node 3]: [L3 Syntax right-to-left-appos] Relation 'appos' must go left-to-right.
[Line 13 Sent CP586-1 Node 10]: [L3 Syntax leaf-fixed] 'fixed' not expected to have children (10:par:fixed --> 8:estar:cop)
[Line 19 Sent CP586-1 Node 10]: [L3 Syntax leaf-fixed] 'fixed' not expected to have children (10:par:fixed --> 13:situação:nmod)
[Line 166 Sent CP606-10 Node 5]: [L3 Syntax leaf-fixed] 'fixed' not expected to have children (5:ponto:fixed --> 4:o:det)
[Line 287 Sent CP643-10 Node 4]: [L3 Syntax leaf-mark-case] 'mark' not expected to have children (4:tanto:mark --> 20:tirava:ccomp)
[Line 113 Sent CP680-3 Node 24]: [L3 Syntax leaf-fixed] 'fixed' not expected to have children (24:par:fixed --> 27:fundos:nmod)
[Line 85 Sent CP683-2 Node 38]: [L3 Syntax leaf-mark-case] 'mark' not expected to have children (38:dado:mark --> 42:grau:nsubj)
[Line 357 Sent CP693-10 Node 35]: [L3 Syntax leaf-mark-case] 'mark' not expected to have children (35:aí:mark --> 33:que:nsubj)
[Line 358 Sent CP693-10 Node 35]: [L3 Syntax leaf-mark-case] 'mark' not expected to have children (35:aí:mark --> 34:estarão:cop)
[Line 385 Sent CP721-5 Node 60]: [L3 Syntax leaf-mark-case] 'mark' not expected to have children (60:outras:mark --> 56:de:case)
[Line 388 Sent CP721-5 Node 60]: [L3 Syntax leaf-mark-case] 'mark' not expected to have children (60:outras:mark --> 59:quaisquer:det)
[Line 150 Sent CP729-8 Node 7]: [L3 Syntax leaf-fixed] 'fixed' not expected to have children (7:ponto:fixed --> 6:o:det)
[Line 15 Sent CP746-1 Node 10]: [L3 Syntax orphan-parent] The parent of 'orphan' should normally be 'conj' but it is 'nsubj'.
[Line 12 Sent CP762-1 Node 8]: [L3 Syntax right-to-left-conj] Relation 'conj' must go left-to-right.
[Line 66 Sent CP844-2 Node 24]: [L3 Syntax right-to-left-conj] Relation 'conj' must go left-to-right.
[Line 107 Sent CP860-3 Node 21]: [L3 Syntax leaf-aux-cop] 'aux' not expected to have children (21:ficaria:aux --> 13:e:mark)
[Line 109 Sent CP860-3 Node 21]: [L3 Syntax leaf-aux-cop] 'aux' not expected to have children (21:ficaria:aux --> 15:mandato:nsubj)
[Line 8 Sent CP862-1 Node 1]: [L3 Syntax leaf-mark-case] 'mark' not expected to have children (1:Contrariamente:mark --> 3:o:amod)
[Line 110 Sent CP901-3 Node 66]: [L3 Syntax right-to-left-flat] Relation 'flat:name' must go left-to-right.
[Line 42 Sent CP912-2 Node 3]: [L3 Syntax leaf-fixed] 'fixed' not expected to have children (3:assim:fixed --> 1:E:cc)
[Line 50 Sent CP912-2 Node 3]: [L3 Syntax leaf-fixed] 'fixed' not expected to have children (3:assim:fixed --> 9:bola:obl)
[Line 55 Sent CP918-2 Node 3]: [L3 Syntax rel-upos-aux] 'aux' should be 'AUX' but it is 'VERB'
[Line 188 Sent CP940-5 Node 34]: [L3 Syntax leaf-mark-case] 'mark' not expected to have children (34:além:mark --> 33:para:case)
[Line 74 Sent CP953-2]: [L2 Metadata missing-spaceafter] 'SpaceAfter=No' is missing in the MISC field of node #10 because the text is 'está, deliberadament[...]'.
[Line 79 Sent CP956-3 Node 25]: [L3 Syntax leaf-aux-cop] 'cop' not expected to have children (25:foi:cop --> 24:que:nsubj)
[Line 419 Sent CP968-10 Node 62]: [L3 Syntax leaf-fixed] 'fixed' not expected to have children (62:ponto:fixed --> 61:o:det)
[Line 103 Sent CP975-4 Node 29]: [L3 Syntax rel-upos-det] 'det' should be 'DET' or 'PRON' but it is 'PROPN'
[Line 68 Sent CP981-3 Node 15]: [L3 Syntax leaf-fixed] 'fixed' not expected to have children (15:relação:fixed --> 13:como:cc)
[Line 344 Sent CP1003-10 Node 13]: [L3 Syntax rel-upos-det] 'det' should be 'DET' or 'PRON' but it is 'PROPN'
alvelvis commented 5 years ago

How do you distinguish serious errors from "normal" ones? I will be fixing errors with small pull requests, too.

arademaker commented 5 years ago

Two criteria:

  1. the ones that we can fix given the number of errors. 72 cases are possible to handle in a few days.
  2. Treebanks with errors related to punctuation errors were accepted in the last release. I consider that they will keep this rule. Other errors can prevent our treebank to participate in UD 2.5 release.
arademaker commented 3 years ago

issue #265 fixed the second point above. No remain validation error in the corpus.