Closed chrisjbryant closed 1 year ago
Hi, thanks for the report! That does look like a bug.
In the more recent trained pipelines, the attribute_ruler
takes care of these particular exceptions. You can have a look into them by printing nlp.get_pipe("attribute_ruler").patterns
if you're interested.
For instance, for 're
, the pipeline does have this correct:
{'patterns': [[{'TAG': 'VBP', 'LOWER': {'IN': ['are', "'re"]}}]], 'attrs': {'LEMMA': 'be', 'POS': 'AUX', 'MORPH': 'Mood=Ind|Tense=Pres|VerbForm=Fin'}, 'index': 0}
But for 've
, the LEMMA
is missing:
{'patterns': [[{'TAG': 'VBP', 'LOWER': {'IN': ['have', "'ve"]}}]], 'attrs': {'POS': 'AUX', 'MORPH': 'Mood=Ind|Tense=Pres|VerbForm=Fin'}, 'index': 0}
The good news is, that you can fix this in your pipeline by writing to the attribute_ruler
's patterns directly, e.g.
nlp = spacy.load("en_core_web_lg")
ruler = nlp.get_pipe("attribute_ruler")
pattern = [{'TAG': 'VBP', 'LOWER': {'IN': ['have', "'ve"]}}]
attrs = {'POS': 'AUX', 'MORPH': 'Mood=Ind|Tense=Pres|VerbForm=Fin', 'LEMMA': 'have'}
ruler.add(patterns=[pattern], attrs=attrs, index=0)
Now, any time 've
is tagged as VBP
in a sentence, its lemma should be have
, as in your example sentence:
I I
ca can
n't not
believe believe
they they
've have
not not
been be
in in
touch touch
We'll also have a look at updating this for the next version of our models!
This should be fixed in the v3.7.x models.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Small bug, but
've
is currently not getting lemmatised ashave
in spacy 3.6. Other contractions seem unaffected.