Georgetown-IR-Lab / QuickUMLS

System for Medical Concept Extraction and Linking
MIT License
369 stars 95 forks source link

[TIP] 2x speed improvement with one changed line #77

Open ldorigo opened 2 years ago

ldorigo commented 2 years ago

Hi, I don't have time to make a PR right now, this is just to let you know that simply excluding NER from the spacy pipeline results in approximately 2x speed (at least when processing lots of short sentences).

You can do so by replacing line 158 of core.py from

            self.nlp = spacy.load(spacy_lang)

to

            self.nlp = spacy.load(spacy_lang, exclude=["ner"])

And most likely, you could also add a separate case (like self.nlp_nosyntax = spacy.load(spacy_lang, exclude=[...])) for matching without syntax where you can exclude most other components as well and get an even larger speedup.