Georgetown-IR-Lab / QuickUMLS

System for Medical Concept Extraction and Linking
MIT License
369 stars 95 forks source link

[BUG] QuickUMLS initialization spacy.load error in Spacy 3.0 #68

Open pokarats opened 3 years ago

pokarats commented 3 years ago

Describe the bug When I tried to initialize QuickUMLS as in the following example:

matcher = QuickUMLS(quickumls_fp, overlapping_criteria, threshold,
                    similarity_name, window, accepted_semtypes)

I got the error below. This is after having already run python -m spacy download en and verified that I could do spacy.load('en_core_web_sm').

OSError: [E941] Can't find model 'en'. It looks like you're trying to load a model from a shortcut, which is deprecated as of spaCy v3.0. To load the model, use its full name instead:

nlp = spacy.load("en_core_web_sm")

To Reproduce As above.

Environment

Additional context Full error message below:

OSError                                   Traceback (most recent call last)
~/opt/anaconda3/envs/quickUMLS/lib/python3.7/site-packages/quickumls/core.py in __init__(self, quickumls_fp, overlapping_criteria, threshold, window, similarity_name, min_match_length, accepted_semtypes, verbose, keep_uppercase)
    149         try:
--> 150             self.nlp = spacy.load(spacy_lang)
    151         except OSError:

~/opt/anaconda3/envs/quickUMLS/lib/python3.7/site-packages/spacy/__init__.py in load(name, disable, exclude, config)
     46     """
---> 47     return util.load_model(name, disable=disable, exclude=exclude, config=config)
     48 

~/opt/anaconda3/envs/quickUMLS/lib/python3.7/site-packages/spacy/util.py in load_model(name, vocab, disable, exclude, config)
    327     if name in OLD_MODEL_SHORTCUTS:
--> 328         raise IOError(Errors.E941.format(name=name, full=OLD_MODEL_SHORTCUTS[name]))
    329     raise IOError(Errors.E050.format(name=name))

OSError: [E941] Can't find model 'en'. It looks like you're trying to load a model from a shortcut, which is deprecated as of spaCy v3.0. To load the model, use its full name instead:

nlp = spacy.load("en_core_web_sm")

For more details on the available models, see the models directory: https://spacy.io/models. If you want to create a blank model, use spacy.blank: nlp = spacy.blank("en")

I had to change the SPACY_LANGUAGE_MAP dict in constants.py line 188 to map to 'en_core_web_sm' instead of 'en' to fix this error. I could have downgraded to an earlier version of spaCy instead to make this work I imagine, but v.3.0 is what was installed by default during the quickumls installation. For spaCy 3.0, this will probably be an issue with other languages as well? So maybe the SPACY_LANGUAGE_MAP should be changed to:

SPACY_LANGUAGE_MAP = {
    'ENG': 'en_core_web_sm',
    'GER': 'de_core_news_sm',
    'SPA': 'es_core_news_sm',
    'POR': 'pt_core_news_sm',
    'FRE': 'fr_core_news_sm',
    'ITA': 'it_core_news_sm',
    'DUT': 'nl_core_news_sm',
    'XXX': 'xx'
}
obdeveloper-question-acount commented 3 years ago

I have this problem too but im using windows and i using catterbot