UniversalDependencies / tools

Various utilities for processing the data.
GNU General Public License v2.0
203 stars 43 forks source link

fix a bug in get_caused_nonprojectivities #67

Closed martinpopel closed 4 years ago

martinpopel commented 4 years ago

Cf. the discussion https://github.com/UniversalDependencies/tools/issues/66#issuecomment-635343892 and the implementation of a similar function in Udapi (except that it does not try to exclude non-projectivities caused by parent): https://github.com/udapi/udapi-python/blob/1e4004f5/udapi/core/node.py#L654-L674

Running validate.py on non-proj.conllu (provided below), should result in a single error

[Line 5 Sent non-proj Node 3]: [L3 Syntax punct-is-nonproj] Punctuation must not be attached non-projectively over nodes [2]
Syntax errors: 1

However, before this bugfix, validate.py reported an extra (false-alarm) error:

[Line 5 Sent non-proj Node 3]: [L3 Syntax punct-causes-nonproj] Punctuation must not cause non-projectivity of nodes [4]

These false-alarm punct-causes-nonproj errors occurred only in presence of punct-is-nonproj errors (I think it can be mathematically proven), so the bug in validate.py did not result in reporting errors in trees which were with no errors. Nevertheless, the bug should be fixed.

BTW: validate.py is becoming a difficult-to-maintain monster. I think it would benefit from using Udapi (or something similar) after checking the first level(s) of validity, but I am afraid of such refactoring.

1   A   A   NOUN    _   _   2   nsubj   _   _
2   B   B   VERB    _   _   0   root    _   _
3   ,   ,   PUNCT   _   _   1   punct   _   _
4   C   C   ADV _   _   2   advmod  _   _