Hi, I tried the 4 english pipelines (en_core_web - md, sm, lg, trf) for POS tagging. I know that spacy is case sensitive but how are words like 'Water', 'Wheat', 'Cereal', 'Dry', 'Information', 'Research', etc tagged as Proper Noun (NNP, PROPN)?
These words are title case. Even lowercase words like oil, dry, nutritional, law, express are tagged as NNP.
I thought maybe 'md' model is not large enough to recognize it but even large models like 'lg' and 'trf' are giving poor results. Am I doing something wrong? Can you please help me?
import spacy
nlp_trf = spacy.load("en_core_web_trf")
doc = nlptrf(text)
for token in doc:
print(token.text, token.lemma, token.tag, token.pos)
How to reproduce the behaviour
Hi, I tried the 4 english pipelines (en_core_web - md, sm, lg, trf) for POS tagging. I know that spacy is case sensitive but how are words like 'Water', 'Wheat', 'Cereal', 'Dry', 'Information', 'Research', etc tagged as Proper Noun (NNP, PROPN)? These words are title case. Even lowercase words like oil, dry, nutritional, law, express are tagged as NNP.
I thought maybe 'md' model is not large enough to recognize it but even large models like 'lg' and 'trf' are giving poor results. Am I doing something wrong? Can you please help me?
import spacy nlp_trf = spacy.load("en_core_web_trf") doc = nlptrf(text) for token in doc: print(token.text, token.lemma, token.tag, token.pos)
Thanks, Aditya
Your Environment