aboSamoor / polyglot

Multilingual text (NLP) processing toolkit
http://polyglot-nlp.com
Other
2.31k stars 337 forks source link

Unexpected POS result #42

Closed natsheh closed 8 years ago

natsheh commented 8 years ago

Hi Rami @aboSamoor,

In the following example, I expected to have 'VERB' pos_tag for the word 'restate', however, I got 'NUM':

blob ="""restate (words) from one language into another language.""" text = Text(blob) text.pos_tags text.words[0].pos_tag

u'NUM'

alantian commented 8 years ago

Hi @natsheh , This is a mistake in model rather than a bug in code.

This happens because restate is an out-of-vocabulary word for Polyglot so it is converted to an 'Unknown word' when Polyglot is doing POS tagging, which leads to producing the wrong POS tag.

natsheh commented 8 years ago

Hi @alantian and @aboSamoor Just a suggestion; May be it would be safer to tag such 'Unknown words' with 'X: other' rather than 'NUM'. I saw few of these cases in which are tagged with 'NUM'. Thanks