Add pattern after adding to spacy pipeline taking long time and memory

gandersen101 / spaczz

Fuzzy matching and more functionality for spaCy.

MIT License

249 stars 27 forks source link

There are 1 Million patterns I am trying to add. On adding to blank spacy model: import spacy from spaczz.pipeline import SpaczzRuler nlp=spacy.blank('en') spaczz_ruler = SpaczzRuler(nlp) spaczz_ruler = nlp.add_pipe("spaczz_ruler") #spaCy v3 syntax spaczz_ruler.add_patterns(patterns) It takes 8 GB of RAM and inference time is around 28 seconds.

If I try to add SpaczzRuler to current ner pipeline using spaczz_ruler = nlp.add_pipe("spaczz_ruler", before="ner") #spaCy v3 syntax It is taking high RAM and time. On 32 GB RAM also it is failing patterns = [ { "label": "NAME", "pattern": "Grant Andersen", "type": "fuzzy", "kwargs": {"min_r2": 90} }]

gandersen101 / spaczz

Add pattern after adding to spacy pipeline taking long time and memory #74