Georgetown-IR-Lab / QuickUMLS

System for Medical Concept Extraction and Linking
MIT License
369 stars 95 forks source link

[FEATURE REQUEST] Option to add own spacy models in constructor #69

Open stephantul opened 3 years ago

stephantul commented 3 years ago

Hi,

thanks for the great package. I think it would be useful to have a constructor message that specifies the spacy model string to load (instead of a generic language attribute and a lookup). For example, I'm currently using scispacy's en_core_sci_sm to parse text instead of en_core_web_sm. I also think en_core_web_sm might be a bad default choice, given that most of the people using quickumls might use it to parse biomedical text (although I don't have any numbers on performance).

This would also solve #68 I currently have a workaround, like this:

import spacy
from quickumls import QuickUMLS

q = QuickUMLS("my_path")
q.nlp = spacy.load("en_core_sci_sm")

I'd envision something like this:

from quickumls import QuickUMLS

q = QuickUMLS("my_path", spacy_model_string="en_core_sci_sm")