Closed matyaskopp closed 1 year ago
@TomazErjavec, currently, L2 validation produces just warnings, so if someone has wrong features, it does not fail.
Do you think that Morpho errors
should produce errors?
Do you think that Morpho errors should produce errors?
Yes, I do. Level 1 errors = errors, Level 2 errors = warnings would be my suggestion.
ok, leaving it as it is now:
But I think we can be stricter, at least for morphology. If someone provides a spaceless random mess in @msd
, it shows only a warning.
But I think we can be stricter, at least for morphology.
Yes, I agree(d), I guess I wan't clear before. What I meant to say is that it morphology is not ok, that should be an error. If syntax is not ok, that should be probably just a warning.
Done. We will see if it works once ES-CT is synced: https://github.com/IULATERM-TRL-UPF/ParlaMint/pull/3
I hava a question about this validation. When I run "make conllu-ES-CT" I get some errors:
[Line 516 Sent ParlaMint-ES-CT_2018-01-17-0101.1.0.8.1]: [L2 Syntax 0-is-not-root] DEPREL must be 'root' if HEAD is 0. [Tree number 12 on line 506 Sent ParlaMint-ES-CT_2018-01-17-0101.1.0.8.1]: [L2 Syntax multiple-roots] Multiple root words: [2, 10] [Tree number 12 on line 506 Sent ParlaMint-ES-CT_2018-01-17-0101.1.0.8.1]: [L2 Format skipped-corrupt-tree] Skipping annotation tests because of corrupt tree structure. [Tree number 23 on line 947 Sent ParlaMint-ES-CT_2018-01-17-0101.2.0.2.1]: [L2 Syntax multiple-roots] Multiple root words: [4, 36] [Tree number 23 on line 947 Sent ParlaMint-ES-CT_2018-01-17-0101.2.0.2.1]: [L2 Format skipped-corrupt-tree] Skipping annotation tests because of corrupt tree structure. [Line 1470 Sent ParlaMint-ES-CT_2018-01-17-0101.2.0.5.4]: [L2 Syntax 0-is-not-root] DEPREL must be 'root' if HEAD is 0. [Line 1470 Sent ParlaMint-ES-CT_2018-01-17-0101.2.0.5.4]: [L2 Syntax 0-is-not-root] DEPREL must be 'root' if HEAD is 0. [Tree number 34 on line 1378 Sent ParlaMint-ES-CT_2018-01-17-0101.2.0.5.4]: [L2 Syntax multiple-roots] Multiple root words: [1, 3, 4] [Tree number 34 on line 1378 Sent ParlaMint-ES-CT_2018-01-17-0101.2.0.5.4]: [L2 Format skipped-corrupt-tree] Skipping annotation tests because of corrupt tree structure. [Tree number 38 on line 1613 Sent ParlaMint-ES-CT_2018-01-17-0101.2.0.6.1]: [L2 Syntax multiple-roots] Multiple root words: [4, 16] [Tree number 38 on line 1613 Sent ParlaMint-ES-CT_2018-01-17-0101.2.0.6.1]: [L2 Format skipped-corrupt-tree] Skipping annotation tests because of corrupt tree structure. [Line 3016 Sent ParlaMint-ES-CT_2018-01-17-0101.3.0.0.1]: [L2 Syntax head-self-loop] HEAD == ID for 28 [Tree number 76 on line 2989 Sent ParlaMint-ES-CT_2018-01-17-0101.3.0.0.1]: [L2 Format skipped-corrupt-tree] Skipping annotation tests because of corrupt tree structure. [Line 3518 Sent ParlaMint-ES-CT_2018-01-17-0101.3.0.7.1]: [L2 Syntax 0-is-not-root] DEPREL must be 'root' if HEAD is 0. [Tree number 83 on line 3425 Sent ParlaMint-ES-CT_2018-01-17-0101.3.0.7.1]: [L2 Syntax multiple-roots] Multiple root words: [2, 93] [Tree number 83 on line 3425 Sent ParlaMint-ES-CT_2018-01-17-0101.3.0.7.1]: [L2 Format skipped-corrupt-tree] Skipping annotation tests because of corrupt tree structure. [Line 3584 Sent ParlaMint-ES-CT_2018-01-17-0101.5.0.0.1]: [L2 Syntax head-self-loop] HEAD == ID for 4 [Tree number 87 on line 3581 Sent ParlaMint-ES-CT_2018-01-17-0101.5.0.0.1]: [L2 Format skipped-corrupt-tree] Skipping annotation tests because of corrupt tree structure. [Line 4126 Sent ParlaMint-ES-CT_2018-01-17-0101.5.0.17.1]: [L2 Syntax 0-is-not-root] DEPREL must be 'root' if HEAD is 0. [Tree number 104 on line 4114 Sent ParlaMint-ES-CT_2018-01-17-0101.5.0.17.1]: [L2 Syntax multiple-roots] Multiple root words: [2, 12] [Tree number 104 on line 4114 Sent ParlaMint-ES-CT_2018-01-17-0101.5.0.17.1]: [L2 Format skipped-corrupt-tree] Skipping annotation tests because of corrupt tree structure. [Line 5488 Sent ParlaMint-ES-CT_2018-01-17-0101.16.0.6.1]: [L2 Syntax 0-is-not-root] DEPREL must be 'root' if HEAD is 0. [Line 5488 Sent ParlaMint-ES-CT_2018-01-17-0101.16.0.6.1]: [L2 Syntax 0-is-not-root] DEPREL must be 'root' if HEAD is 0. [Line 5488 Sent ParlaMint-ES-CT_2018-01-17-0101.16.0.6.1]: [L2 Syntax 0-is-not-root] DEPREL must be 'root' if HEAD is 0. [Line 5488 Sent ParlaMint-ES-CT_2018-01-17-0101.16.0.6.1]: [L2 Syntax 0-is-not-root] DEPREL must be 'root' if HEAD is 0. [Tree number 150 on line 5479 Sent ParlaMint-ES-CT_2018-01-17-0101.16.0.6.1]: [L2 Syntax multiple-roots] Multiple root words: [1, 2, 6, 8] [Tree number 150 on line 5479 Sent ParlaMint-ES-CT_2018-01-17-0101.16.0.6.1]: [L2 Format skipped-corrupt-tree] Skipping annotation tests because of corrupt tree structure. [Tree number 162 on line 5723 Sent ParlaMint-ES-CT_2018-01-17-0101.16.0.12.2]: [L2 Syntax multiple-roots] Multiple root words: [5, 10] [Tree number 162 on line 5723 Sent ParlaMint-ES-CT_2018-01-17-0101.16.0.12.2]: [L2 Format skipped-corrupt-tree] Skipping annotation tests because of corrupt tree structure. Format errors: 10 Syntax errors: 19 FAILED with 29 errors
How can I fix it?
No so simple to fix. As many annotation tools seem to produce these bugs, we have changed this to warning (see above), so, you could leave it as is. But if you want to fix it, see #474 for some suggestions and discussion.
How can I fix it?
These kinds of errors are not deal-breaking errors. We do not insist on L2 syntax and L2 format validity, but these should be rare errors - only in some obscure sentences, not over the whole corpus.
I can see two possible solutions:
dep
relation and 0-is-not-root with root
relation
https://github.com/clarin-eric/ParlaMint/actions/runs/3711782252/jobs/6293360360#step:4:261