aboSamoor / polyglot

Multilingual text (NLP) processing toolkit
2.28k stars 337 forks source link

Inaccurate POS tag #232

Open devikasondhi opened 3 years ago

devikasondhi commented 3 years ago


Referring to the text sample We will meet at eight o'clock on Thursday morning. in POS.ipynb notebook, shouldn't the token o'clock be tagged as an adverb instead of a Noun, as can be verified with any standard English dictionary?

floschne commented 3 years ago

Probably yes! BUT: POS Taggers are either probabilistic models or rule-based methods and are therefore NEVER 100% accurate. Even modern models using contextualized word embeddings have some 5 - 10 % error rate on this Task :)