I'm trying to lemmatize a text which I cleaned earlier. The issue I had was due to runtime so I decided to cut down certain pipelines out since I wanted lemmas only. When I only enable lemmas I got some warnings but I also wanted to filter based on POS tags such as ['ADJ', 'NOUN', 'VERB', 'ADV']. In order to generate .pos_ attribute, I enabled pipeline components for that which documentation said tagger and parser. However using those only doesn'y really work here as I am not getting expected POS tags. When I use the full pipeline I get expected results but not when I use certain pipelines. Is this behaviour expected? If so, why? How do I know which pipelines to exclude as I am a bit of confused now.
Thanks in advance!
How to reproduce the behaviour
Here is the code sample that doesn't work:
nlp = spacy.load('en_core_web_sm', enable=['lemmatizer', 'tagger', "parser", "attribute_ruler"])
text = """
If you like the taste of Sweet Low get this If you don t don t Couldn t get through one cup of coffee
I m gonna give Stevia Extract in the Raw a try It s made by the folks at Sugar in the Raw Here s
what they claim Stevia Extract In The Raw gets its delicious natural sweetness from Rebiana an
extract from the Stevia plant This extract is the sweetest part of the plant and has recently
been isolated to provide pure sweetening power without the licorice like aftertaste that many
of our predecessors exhibited All you get is the sweet flavor without any calories
We ll see Simply Stevia is simply nasty
"""
print([t.pos_ for t in nlp(text)])
The one that works:
nlp = spacy.load('en_core_web_sm')
print([t.pos_ for t in nlp(text)])
Hey everyone,
I'm trying to lemmatize a text which I cleaned earlier. The issue I had was due to runtime so I decided to cut down certain pipelines out since I wanted lemmas only. When I only enable lemmas I got some warnings but I also wanted to filter based on POS tags such as ['ADJ', 'NOUN', 'VERB', 'ADV']. In order to generate .pos_ attribute, I enabled pipeline components for that which documentation said
tagger
andparser
. However using those only doesn'y really work here as I am not getting expected POS tags. When I use the full pipeline I get expected results but not when I use certain pipelines. Is this behaviour expected? If so, why? How do I know which pipelines to exclude as I am a bit of confused now.Thanks in advance!
How to reproduce the behaviour
Here is the code sample that doesn't work:
The one that works:
Your Environment