explosion / spaCy

💫 Industrial-strength Natural Language Processing (NLP) in Python
https://spacy.io
MIT License
29.82k stars 4.37k forks source link

'POS tagging' output is not correct #13241

Closed adityakadrekar16 closed 8 months ago

adityakadrekar16 commented 8 months ago

How to reproduce the behaviour

Hi, I tried the 4 english pipelines (en_core_web - md, sm, lg, trf) for POS tagging. I know that spacy is case sensitive but how are words like 'Water', 'Wheat', 'Cereal', 'Dry', 'Information', 'Research', etc tagged as Proper Noun (NNP, PROPN)? These words are title case. Even lowercase words like oil, dry, nutritional, law, express are tagged as NNP.

I thought maybe 'md' model is not large enough to recognize it but even large models like 'lg' and 'trf' are giving poor results. Am I doing something wrong? Can you please help me?

import spacy nlp_trf = spacy.load("en_core_web_trf") doc = nlptrf(text) for token in doc: print(token.text, token.lemma, token.tag, token.pos)

Thanks, Aditya

Your Environment

adityakadrekar16 commented 8 months ago

Hi @svlandeg, can you help me out here?

svlandeg commented 8 months ago

Hi! Please avoid tagging individual maintainers.

Let me transfer this to the discussion forum and follow up there.