Open nschneid opened 6 years ago
Possibly related old issues: https://github.com/UniversalDependencies/docs/issues/112 https://github.com/UniversalDependencies/docs/issues/181
I want to reiterate this problem of short abbreviation tagging.
The classic example is the short form vs.
or just v.
used in most legal text instead of the full word versus
.
On annotating the text using udpipe english_ewt
model it takes the period inside the token (but still isnt able to lemmatize it to VERSUS while the english_partut
treats the period as a separate token and abruptly ends the sentence. So we have a problem here that may be serious enough for legal text.
Spawned off of #513 and UniversalDependencies/UD_English#40.
I proposed:
There has been further discussion about single-token abbreviations that would expand to multiple words, and whether to expand frequent single-word abbreviations.