Closed martinpopel closed 7 years ago
Merged. But to put things straight: the validator was not failing with an exception. It caught and reported the exception. :) I think a solution would be to move the try ... except
block https://github.com/UniversalDependencies/tools/blob/master/validate.py#L706 tighter around the sentence. That would pretty much achieve what you suggest: i.e. fail and move on to the next sentence. Then again, if there is a wrong number of columns, it's fine to scream and quit - the treebank is broken and whoever is working on it will be fixing it anyway. I think it's much less important when and how the validator reports, providing it doesn't pass incorrect stuff through.
I fully agree.
@martinpopel I am a bit scared to rock the boat this late in the game, but if you have a bit of free time at your hands, it would be good to have a third pair of eyes look at the validator's test suite and see nothing is forgotten. In the past few releases, there's always been one treebank which managed to sneak wrong data through the validator. I'd hate to have that happen again. :slightly_frowning_face: Thanks! :+1:
@fginter: I am not aware of any missing tests in validate.py
.
I have some ideas for svalidation e.g. that SYM and PUNCT are not alphabetical.
I plan to devote my today's free time to fixing the treebanks which FAIL validation (and it seems their maintainers don't plan to fix it today before the data freeze).
and it should not fail with exception on invalid data, it should rather report which lines are invalid.
This commit prevents exceptions when a CoNLL-U contains e.g. just one column instead of 10 columns. It is not very elegant, I admit. Maybe it would be better to just jump to the next tree after reporting less than 10 columns.