Closed Rahul-Chittora closed 1 year ago
Yes, I would imagine trying to match over 1 million patterns would take a lot of compute resources and time. spaczz
is nowhere near as efficient as spacy
and I do not currently have the time or resources to significantly improve spaczz
's performance.
There are 1 Million patterns I am trying to add. On adding to blank spacy model:
import spacy from spaczz.pipeline import SpaczzRuler nlp=spacy.blank('en') spaczz_ruler = SpaczzRuler(nlp) spaczz_ruler = nlp.add_pipe("spaczz_ruler") #spaCy v3 syntax spaczz_ruler.add_patterns(patterns)
It takes 8 GB of RAM and inference time is around 28 seconds.If I try to add SpaczzRuler to current ner pipeline using
spaczz_ruler = nlp.add_pipe("spaczz_ruler", before="ner") #spaCy v3 syntax
It is taking high RAM and time. On 32 GB RAM also it is failingpatterns = [ { "label": "NAME", "pattern": "Grant Andersen", "type": "fuzzy", "kwargs": {"min_r2": 90} }]