Closed bguil closed 2 years ago
In general I agree (not only for the X
tag but perhaps for any tag). But I am hesitant to hard-code it in the validator when checking the boxes in the form is not too much work and it is then neatly visible alongside all other features.
There is one issue though that I have not solved yet and that makes the feature Foreign
special anyway. The attribute Lang=br
in MISC indicates that morphological features in FEATS, if present, are Breton rather than French. However, the feature Foreign
should probably be a (hard-coded) exception because:
Lang=br
is or is not in MISC. (Sometimes the code of the source language is not available, some corpora use only Foreign=Yes
without Lang=xx
etc.)I might also suggest taking a look at the guideline suggestions for UGC (Section 4.7) in our recent journal article:
Treebanking user-generated content: a UD based overview of guidelines, corpora and unified recommendations https://link.springer.com/content/pdf/10.1007/s10579-022-09581-9.pdf
The validator should now judge the Foreign
feature according to the main language of the corpus, regardless of Lang=xx
in MISC.
Using
validate.py
for some French data, I had the following error:for the CoNLL line:
I think it would be sensible to allow the feature
Foreign=Yes
on theX
tag whatever is the language.