UniversalDependencies / docs

Universal Dependencies online documentation
http://universaldependencies.org/
Apache License 2.0
271 stars 245 forks source link

Flat:foreign with Typo=Yes #1025

Closed lauma closed 5 months ago

lauma commented 6 months ago

Validator gives warnings with type flat-foreign-upos-feats. Mostly they are very useful and helps finding misannotation, however I would like to question if an exception for "Typo=Yes" could be reasonable.

Our example is sentence which is overall in Latvian, but contains words "fasf food". As our annotators all speak English, it was recognised as misspeling of "fast food" and marked as typo. On the other hand, if the insertion were typo in some language annotator didn't know, similar typo would pass unrecognised. Should we forgo annotating any typos in any foreign languages for the sake of more consistent annotation?

jnivre commented 6 months ago

I guess the validator's behaviour is based on the assumption that, if one chooses to use "flat" together with the feature Foreign=Yes to annotate foreign material, then this implies that there will be no analysis of the material (over and above the fact that it is in a different language). But this is not the only way to analyse foreign material, not even the preferred one if the foreign expression is analysable, as described in this recent addition to the guidelines: https://universaldependencies.org/foreign.html

It seems that an expression like "fast food" would be a good candidate for being analysed using a "code-switched" or "borrowed" analysis (depending on how well integrated it is in Latvian).

amir-zeldes commented 6 months ago

Either way I would like for it to be possible to flag typos in things we consider to be foreign, incl. wholesale quotations that are clearly not borrowings etc. On the other hand, if it's just a warning and not an error then I think it's appropriate - it's just alerting developers to an unusual situation. Maybe a mechanism to acknowledge and silence the warning for known cases is the way to go?

dan-zeman commented 6 months ago

Maybe a mechanism to acknowledge and silence the warning for known cases is the way to go?

I suppose in this case the easiest way of silencing the warning is using the code-switching analysis. BTW, if I recall it correctly, the warning is not triggered by the feature Foreign=Yes (which would be used in code-switching analysis as well) but by the subtype flat:foreign. flat without subtype could not trigger it because it can be used also for other things.