aboSamoor / polyglot

Multilingual text (NLP) processing toolkit
http://polyglot-nlp.com
Other
2.28k stars 337 forks source link

Inaccurate POS tag #232

Open devikasondhi opened 3 years ago

devikasondhi commented 3 years ago

Hello,

Referring to the text sample We will meet at eight o'clock on Thursday morning. in POS.ipynb notebook, shouldn't the token o'clock be tagged as an adverb instead of a Noun, as can be verified with any standard English dictionary?

floschne commented 3 years ago

Probably yes! BUT: POS Taggers are either probabilistic models or rule-based methods and are therefore NEVER 100% accurate. Even modern models using contextualized word embeddings have some 5 - 10 % error rate on this Task :)