aboSamoor / polyglot

Multilingual text (NLP) processing toolkit
http://polyglot-nlp.com
Other
2.29k stars 337 forks source link

NER results different in demo than in python program #195

Open MarijaKiran opened 5 years ago

MarijaKiran commented 5 years ago

Hi,

I'm using polyglot 16.7.4 and python3.7

For the NER problem I noticed different results with my python program and the online demo at this link: https://sites.google.com/site/rmyeid/projects/polylgot-ner

When choosing the 'tokenize' option on the online demo, NER gives great results, but in my python program I get different results which seem equivalent with the demo's results when the option 'tokenize' is turned off.

Now I've noticed there is an option to do tokenization before NER in polyglot line command as such : polyglot --lang en tokenize --input testdata/cricket.txt | polyglot --lang en ner | tail -n 20, but there is no such option when doing NER from python. Does that mean that tokenization before NER is done automatically?

And if so, why are the results from the python program using polyglot different than the online demo for NER polyglot?

Thank you.