allenai / scispacy

A full spaCy pipeline and models for scientific/biomedical documents.
https://allenai.github.io/scispacy/
Apache License 2.0
1.66k stars 223 forks source link

POS tagging results are not correct #502

Closed adityakadrekar16 closed 6 months ago

adityakadrekar16 commented 6 months ago

Hi, I tried the 4 english pipelines (en_core_web - md, sm, lg, trf) for POS tagging. I know that spacy is case sensitive but how are words like 'Water', 'Wheat', 'Cereal', 'Information', and 'Research' tagged as Proper Noun (NNP, PROPN)?

I thought maybe 'md' model is not large enough to recognize it but even large models like 'lg' and 'trf' are giving poor results. Am I doing something wrong? Can you please help me?

Note: I am using spacy version 3.6.1 on python 3.10.13

Thanks, Aditya

adityakadrekar16 commented 6 months ago

I submitted this here instead of the spacy git channel. Closing the issue.