Georgetown-IR-Lab / QuickUMLS

System for Medical Concept Extraction and Linking
MIT License
369 stars 95 forks source link

about `nlp.add_pipe` in the demo #79

Open newbietuan opened 2 years ago

newbietuan commented 2 years ago

Describe the bug 1638879733 when i run the demo code,there something wrong about “nlp.add_pipe(quickumls_component)”,

Traceback (most recent call last): File "umlsdemo.py", line 8, in nlp.add_pipe(quickumls_component) File "/home/mayt/anaconda3/envs/umls/lib/python3.7/site-packages/spacy/language.py", line 769, in add_pipe raise ValueError(err) ValueError: [E966] nlp.add_pipe now takes the string name of the registered component factory, not a callable component. Expected string, but got <quickumls.spacy_component.SpacyQuickUMLS object at 0x7f24b35e5cd0> (name: 'None').

To Reproduce

Environment

Additional context it seems relate to spacy accroding to https://stackoverflow.com/questions/67906945/valueerror-nlp-add-pipe-now-takes-the-string-name-of-the-registered-component-f while i still don't konw how to modify the code~~

ygivenx commented 2 years ago

It can be used like this.

import spacy
from spacy.language import Language
from quickumls.spacy_component import SpacyQuickUMLS

@Language.component('quickumls_component')
def quickumls_component(doc):
    return SpacyQuickUMLS(nlp, <Path to quickUmls install dir>)(doc)

nlp.add_pipe('quickumls_component', last=True)

doc = nlp(full_rpts.iloc[0])
shrimonmuke0202 commented 1 year ago

Hi everyone, When I using this code I got the this error [[E090] Extension 'similarity' already exists on Span. To overwrite the existing extension, setforce=TrueonSpan.set_extension.]

ghost commented 1 year ago

@shrimonmuke0202 did you solve this problem?? [[E090] Extension 'similarity' already exists on Span. To overwrite the existing extension, set force=TrueonSpan.set_extension.]

ysu1213 commented 1 year ago

It can be used like this.

import spacy
from spacy.language import Language
from quickumls.spacy_component import SpacyQuickUMLS

@Language.component('quickumls_component')
def quickumls_component(doc):
    return SpacyQuickUMLS(nlp, <Path to quickUmls install dir>)(doc)

nlp.add_pipe('quickumls_component', last=True)

doc = nlp(full_rpts.iloc[0])

Hi there, thank you so much for sharing a solution! I was able to get past the add_pipe error but not further. Could you explain what the line of code on doc = nlp(full_rpts.iloc[0]) does? I was trying to put into something like doc = nlp('Pt c/o shortness of breath, chest pain, nausea, vomiting, diarrrhea') but that does not work. Initially I tried copy pasting your code entirely, but it returns the error saying "full_rpts" is not defined - is there some missing context here about this line of code? Thank you so much!

ygivenx commented 1 year ago

It can be used like this.

import spacy
from spacy.language import Language
from quickumls.spacy_component import SpacyQuickUMLS

@Language.component('quickumls_component')
def quickumls_component(doc):
    return SpacyQuickUMLS(nlp, <Path to quickUmls install dir>)(doc)

nlp.add_pipe('quickumls_component', last=True)

doc = nlp(full_rpts.iloc[0])

Hi there, thank you so much for sharing a solution! I was able to get past the add_pipe error but not further. Could you explain what the line of code on doc = nlp(full_rpts.iloc[0]) does? I was trying to put into something like doc = nlp('Pt c/o shortness of breath, chest pain, nausea, vomiting, diarrrhea') but that does not work. Initially I tried copy pasting your code entirely, but it returns the error saying "full_rpts" is not defined - is there some missing context here about this line of code? Thank you so much!

full_rpts.iloc[0] returns a string from pandas dataframe, so doc = nlp('Pt c/o shortness of breath, chest pain, nausea, vomiting, diarrrhea') is correct. Did you update the UMLS install location in the code below?

def quickumls_component(doc):
    return SpacyQuickUMLS(nlp, <Path to quickUmls install dir>)(doc)
gah-bo commented 1 year ago

It can be used like this.

import spacy
from spacy.language import Language
from quickumls.spacy_component import SpacyQuickUMLS

@Language.component('quickumls_component')
def quickumls_component(doc):
    return SpacyQuickUMLS(nlp, <Path to quickUmls install dir>)(doc)

nlp.add_pipe('quickumls_component', last=True)

doc = nlp(full_rpts.iloc[0])

Is the Path to quickUmls install dir supposed to be the same as quickumls_fp in this code block?

matcher = QuickUMLS(quickumls_fp, ...)

If so, I am doing this yet get this message:

Loading QuickUMLS resources from a default SAMPLE of UMLS data from here: /opt/conda/envs/python38/lib/python3.8/site-packages/resources/quickumls/QuickUMLS_SAMPLE_lowercase_POSIX_unqlite

and no output from the print statements from the code in OP's block

However, this works fine

# Initialize QuickUMLS matcher
matcher = QuickUMLS("./libraries/quickumls", "score", 0.99)

def quick_UMLS_match(medical_text):
    if len(medical_text) > 1000000:
        processed_text = medical_text[:1000000]
    else:
        processed_text = medical_text
    return matcher.match(processed_text, best_match=True, ignore_syntax=False)

But I am trying to implement medspacy as I extract items from the QuickUMLS output in a super inneficient way and this seems like the proper way. For what it's worth, this is how I do it:

def quick_UMLS_extractor(matcher_output, return_field, unique=True):
    return_items = [entity[return_field] for sublst in matcher_output for entity in sublst]

    if unique:
        return_items = list(set(return_items))
        return return_items
    else:
        return return_items