Georgetown-IR-Lab / QuickUMLS

System for Medical Concept Extraction and Linking
MIT License
369 stars 95 forks source link

__init__() got an unexpected keyword argument 'spacy_component' #65

Open eunsuk-c opened 3 years ago

eunsuk-c commented 3 years ago

Describe the bug

When I run the following code:

from quickumls.spacy_component import SpacyQuickUMLS

nlp = spacy.load('en_core_web_sm') quickumls_component = SpacyQuickUMLS(nlp, '/home/silverock/umls/quickUMLS') nlp.add_pipe(quickumls_component)

I got this message:


TypeError Traceback (most recent call last)

in 2 3 nlp = spacy.load('en_core_web_sm') ----> 4 quickumls_component = SpacyQuickUMLS(nlp, '/home/silverock/umls/quickUMLS') 5 nlp.add_pipe(quickumls_component) ~/anaconda3/lib/python3.8/site-packages/quickumls/spacy_component.py in __init__(self, nlp, quickumls_fp, best_match, ignore_syntax, **kwargs) 23 """ 24 ---> 25 self.quickumls = QuickUMLS(quickumls_fp, 26 # By default, the QuickUMLS objects creates its own internal spacy pipeline but this is not needed 27 # when we're using it as a component in a pipeline TypeError: __init__() got an unexpected keyword argument 'spacy_component' Could you solve this issue? **My Environment ** - OS: Ubuntu 20.04 - QuickUMLS version 1.4 - UMLS version: 2020AB - spacy 2.4.5 - Python 3.8.5
jimhavrilla commented 3 years ago

I think I'm getting a similar error to this. It tells me there is "no such module" as spacy_component.

pokarats commented 3 years ago

@eunsuk-c I got this error too. Apparently pip install quickumls didn't install the latest commit as seen in this repo. At least that's the case for me and I don't know why. When I checked the core.py in my environment against the core.py as seen in the repo, I got these differences below. So, perhaps you might want to check if this is the case in your environment?

Without looking through all of the diff output below, the spacy_component = False argument is missing in the init function of the QuickUMLS Class in core.py that got installed. So I simply replaced the core.py in my environment with the latest version from this repo that has the spacy_component argument.

After updatingcore.py, I was able to instantiate SpacyQuickUMLS(nlp, '\<path to quick umls data>'). That being said, this whole pipeline does not work with SpaCy 3.0, but since you're using spaCy 2.4.5, I'm assuming this is not an issue for you.

$ diff -w /Users/<username>/opt/anaconda3/envs/quickUMLS/lib/python3.7/site-packages/quickumls/core.py QuickUMLS/quickumls/core.py 

29c29,30
<             verbose=False, keep_uppercase=False):
---
>             verbose=False, keep_uppercase=False,
>             spacy_component = False):
148a150,154
>         # if this is not being executed as as spacy component, then it must be standalone
>         if spacy_component:
>             # In this case, the pipeline is external to this current class
>             self.nlp = None
>         else:
325d330
<                 for cui, semtypes, preferred in cuisem_match:
331a337
> 
334a341,342
>                 for cui, semtypes, preferred in cuisem_match:
> 
438a447,468
>         # pass in parsed spacy doc to get concept matches
>         matches = self._match(parsed)
> 
>         return matches
>         
>     def _match(self, doc, best_match=True, ignore_syntax=False):
>         """Gathers ngram matches given a spaCy document object.
> 
>         [extended_summary]
> 
>         Args:
>             text (Document): spaCy Document object to be used for extracting ngrams
> 
>             best_match (bool, optional): Whether to return only the top match or all overlapping candidates. Defaults to True.
>             ignore_syntax (bool, optional): Wether to use the heuristcs introduced in the paper (Soldaini and Goharian, 2016). TODO: clarify,. Defaults to False
> 
>         Returns:
>             List: List of all matches in the text
>             TODO: Describe format
>         """
>         
>         ngrams = None
440c470
<             ngrams = self._make_token_sequences(parsed)
---
>             ngrams = self._make_token_sequences(doc)
442c472
<             ngrams = self._make_ngrams(parsed)
---
>             ngrams = self._make_ngrams(doc)
449c479
<         self._print_verbose_status(parsed, matches)
---
>         self._print_verbose_status(doc, matches)
(quickUMLS) Suruthais-MacBook-Pro:UMLS noonscape$ diff -b /Users/noonscape/opt/anaconda3/envs/quickUMLS/lib/python3.7/site-packages/quickumls/core.py QuickUMLS/quickumls/core.py 
29c29,30
<             verbose=False, keep_uppercase=False):
---
>             verbose=False, keep_uppercase=False,
>             spacy_component = False):
148a150,154
>         # if this is not being executed as as spacy component, then it must be standalone
>         if spacy_component:
>             # In this case, the pipeline is external to this current class
>             self.nlp = None
>         else:
325d330
<                 for cui, semtypes, preferred in cuisem_match:
331a337
> 
334a341,342
>                 for cui, semtypes, preferred in cuisem_match:
> 
438a447,468
>         # pass in parsed spacy doc to get concept matches
>         matches = self._match(parsed)
> 
>         return matches
>         
>     def _match(self, doc, best_match=True, ignore_syntax=False):
>         """Gathers ngram matches given a spaCy document object.
> 
>         [extended_summary]
> 
>         Args:
>             text (Document): spaCy Document object to be used for extracting ngrams
> 
>             best_match (bool, optional): Whether to return only the top match or all overlapping candidates. Defaults to True.
>             ignore_syntax (bool, optional): Wether to use the heuristcs introduced in the paper (Soldaini and Goharian, 2016). TODO: clarify,. Defaults to False
> 
>         Returns:
>             List: List of all matches in the text
>             TODO: Describe format
>         """
>         
>         ngrams = None
440c470
<             ngrams = self._make_token_sequences(parsed)
---
>             ngrams = self._make_token_sequences(doc)
442c472
<             ngrams = self._make_ngrams(parsed)
---
>             ngrams = self._make_ngrams(doc)
449c479
<         self._print_verbose_status(parsed, matches)
---
>         self._print_verbose_status(doc, matches)

My Environment

pokarats commented 3 years ago

I think I'm getting a similar error to this. It tells me there is "no such module" as spacy_component.

@jimhavrilla I had this error as well. For some reasons, when I installed this library with pip install quickumls, the spacy_component.py did not get installed in my environment site packages. Perhaps, that's also what happened with your install? I had to manually put spacy_component.py where the rest of the module files are for this to work.

My Environment

jimhavrilla commented 3 years ago

Yes, this seems to be correct. I also got some weird bug during install where if I tried to run pip install again it kept trying to reinstall spaCy 3.0. Possibly related to what you said?

Jim Havrilla

On Fri, Feb 26, 2021, 8:31 AM Noon Pokaratsiri Goldstein < notifications@github.com> wrote:

I think I'm getting a similar error to this. It tells me there is "no such module" as spacy_component.

@jimhavrilla https://github.com/jimhavrilla I had this error as well. For some reasons, when I installed this library with pip install quickumls, the spacy_component.py did not get installed in my environment site packages. Perhaps, that's also what happened with your install? I had to manually put spacy_component.py where the rest of the module files are for this to work.

My Environment

  • OS: Mac OS X Big Sur
  • QuickUMLS version 1.4 (I installed this Feb 2021)
  • UMLS version: 2019AB
  • Spacy 3.0 (Note that there are other issues with SpacCy 3.0 and this library)
  • Python 3.7
  • anaconda environment

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Georgetown-IR-Lab/QuickUMLS/issues/65#issuecomment-786649179, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABSDYBGTT6RLNDBZUHZBXWLTA6PCNANCNFSM4WFCNTBA .

jmugan commented 3 years ago

I had to overwrite core.py with the version from the repo.

DSLituiev commented 2 years ago

can some push this to PYPI?

@soldni @galtay @ldorigo @burgersmoke

burgersmoke commented 1 year ago

@DSLituiev As a part of medspacy, we have started our own fork of QuickUMLS. In this fork, we support spacy v3.x and we have addressed this issue with version 2.5 of medspacy_quickumls.

Please note if you decide to consider any of the options below which address this, please note that as a medspacy team, we have elected to no longer support leveldb as a database backend since we've encountered problems and we do not have time or resources to troubleshoot and fix these.

That repo is here: https://github.com/medspacy/QuickUMLS

It's also now pip-installable here: https://pypi.org/project/medspacy-quickumls/