UniversalDependencies / UD_Irish-IDT

Irish data
Other
6 stars 7 forks source link

Restrict permitted features in validation script #138

Closed kscanne closed 3 years ago

kscanne commented 3 years ago

The current validation script permits many feature/upos combinations that occurred in the pre-cleaned-up data. It would be good to remove those to increase the usefulness of the validation for future releases.

For example, Gender is currently permitted with all UPOS tags except PUNCT and CCONJ. In the cleaned-up treebank, it's only necessary for ADJ, ADP, AUX, DET, NOUN, PRON, PROPN (and AUX only because of some combined forms "sé").

Here's the full survey:

https://gist.github.com/kscanne/4d4288aae7506f5e202b5a9d5e907e12

tlynn747 commented 3 years ago

Have reviewed all of these. The only two I didn't update accordingly was Typo=Yes and Foreign=Yes, as while they may not be applied to all UPOS yet, theoretically they can be applied (apart from SYM..)