Closed msklvsk closed 5 years ago
The new validator (for Python 3) reports both the line number and the sentence id:
./validate.sh --lang uk --max-err=10 UD_Ukrainian-IU/uk_iu-ud-dev.conllu [Line 2789 Sent 142p]: Punctuation must not cause non-projectivity of nodes [31] [Line 3867 Sent 2syn]: 'cc' not expected to have children ({'punct'}) [Line 4818 Sent 1l2x]: 'cop' not expected to have children ({'advmod'}) [Line 5097 Sent 1l8v]: 'cop' not expected to have children ({'advmod'}) [Line 5722 Sent 1lok]: 'advmod' should be 'ADV' but it is 'DET' [Line 6250 Sent 1m1g]: 'aux' not expected to have children ({'advmod'}) [Line 7292 Sent 1wwz]: 'cop' not expected to have children ({'advmod'}) [Line 9331 Sent 1yg5]: 'cop' not expected to have children ({'advmod'}) [Line 9332 Sent 1yg5]: 'advmod' should be 'ADV' but it is 'DET' ...suppressing further errors regarding Syntax *** FAILED *** with 17 errors Syntax errors: 17
Cool!
Off the topic, I have some objections to the new validation rules.
appos
, put after the quotes, in regard to something inside the quotes. Seems legit?
cc
is really unnecessary but the author put it there.cop
children are negations and should indeed be moved up (and internally labeled with #phrase-modification). However, what about [Gloss]: She not was the-best, she is the-best!advmod:det
is a lang-specific phenomenon. Not sure how to analyze it in a better way.Are the new validation rules mature enough so a separate issue can be opened to discuss them?
The new validation rules are mature enough for separate issues to be opened (preferably in the docs issue tracker) about individual rules. Otherwise they are not mature, which means that discussion is welcome :-)
[Tree number 4093 on line 86571]: Mismatch between the text attribute and the FORM field.
You
cat *.conllu
to pipe tovalidate.py
to check for duplicate ids and whatnot. After the script is done with the first file, line numbers become non-informative.# sent_id
s would reliably id the erroneous sentence.