Open nschneid opened 5 years ago
Also conventionalized colloquial truncations of words, like "info" for "information", "meds" for "medications", "limo" for "limousine", "fab" for "fabulous", and "physio" for "physiotherapist".
This is tricky issue, thanks for pointing it out... For comparison, in UD_English-GUM the lemmas do standardize across clear errors, but not abbreviations. One of the main criteria we use is "if the writer had been made aware of the issue, would they have spelled it differently?". Here are some cases where we answered yes:
Items like 'physio' would probably be left alone in GUM as a kind of synonym (essentially the idea is that the writer can choose between the lexical item physio and physiotherapy). One argument is maybe independent morphology: so I'm not sure about 'physios', but I think you can definitely say 'limos', and maybe 'fab' is comparable (fabber? more fab?).
We have plenty of multiword abbreviations and we lemmatize them as themselves (OMG stays OMG). POS choice is also tricky there, and we base it on the expanded form's head (e.g. we've tagged CMV for "Change My View" as an imperative verb!)
Ideally we would systematically mark these with the feature
Abbr=Yes
. Currently this feature is mainly being used for colloquial shortenings ("ppl", "prolly").Should the lemma spell out the word, e.g. to disambiguate "St." as "Street" vs. "Saint"? What if its an abbreviation of multiple words ("OMFG")?