The current validation script permits many feature/upos combinations that occurred in the pre-cleaned-up data. It would be good to remove those to increase the usefulness of the validation for future releases.
For example, Gender is currently permitted with all UPOS tags except PUNCT and CCONJ. In the cleaned-up treebank, it's only necessary for ADJ, ADP, AUX, DET, NOUN, PRON, PROPN (and AUX only because of some combined forms "sé").
Have reviewed all of these.
The only two I didn't update accordingly was Typo=Yes and Foreign=Yes, as while they may not be applied to all UPOS yet, theoretically they can be applied (apart from SYM..)
The current validation script permits many feature/upos combinations that occurred in the pre-cleaned-up data. It would be good to remove those to increase the usefulness of the validation for future releases.
For example, Gender is currently permitted with all UPOS tags except PUNCT and CCONJ. In the cleaned-up treebank, it's only necessary for ADJ, ADP, AUX, DET, NOUN, PRON, PROPN (and AUX only because of some combined forms "sé").
Here's the full survey:
https://gist.github.com/kscanne/4d4288aae7506f5e202b5a9d5e907e12