Closed chingan-tsc closed 4 years ago
I am sure that the spaCy lemmatizer would lemmatize the word
learning
tolearn
.
To double check, can you provide the output of this command?
print([(token.text, token.lemma_) for token in doc])
I actually did try that, and interestingly, it returns something like
this
be
an
article
about
machine
learning
and
AI
in
general
Which means in this longer sentence, learning's lemma is still learning.
But if you do
doc = nlp("machine learning")
for token in doc:
print(token.lemma_)
It would yield machine, learn
.
What happens here is that the lemmatization is dependent on the POS tags. If you have a sentence like "The machine is learning an awful lot.", the word "learning" has POS VERB
and is lemmatized to "learn". In contrast, in a sentence like "This is an article about machine learning and AI in general.", "learning" will get the POS NOUN
and the lemma will just keep the same form "learning".
I would say that these outputs are actually correct: "learning" is definitely a noun in your example, and the base form of that noun is "learning".
In that sense it's also not a bug from the Matcher
. If you'd want to match on this more general case, I'm afraid you'll have to add a few additional rules. Hope that helps!
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
How to reproduce the behaviour
I am having an issue similar with https://github.com/explosion/spaCy/issues/5046 but in my case, I am sure that the spaCy lemmatizer would lemmatize the word
learning
tolearn
. Hence when I have a matcher with a pattern[{"LEMMA": "learn"}]
I am expecting it to matchlearning
as well but it doesn't.Your Environment