allenai / scispacy

A full spaCy pipeline and models for scientific/biomedical documents.
https://allenai.github.io/scispacy/
Apache License 2.0
1.69k stars 227 forks source link

EntityLinker knowledge base returns CUIs not MeSH IDs when 'mesh' is selected #355

Open xegulon opened 3 years ago

xegulon commented 3 years ago

I'm using scispaCy entity linker using this snippet:

from scispacy.linking import EntityLinker
import spacy, scispacy

config = {
    "resolve_abbreviations": True,  
    "name": "mesh", 
    "max_entities_per_mention":1
}

nlp = spacy.load("en_core_sci_sm")
nlp.add_pipe("scispacy_linker", config=config) 

linker = nlp.get_pipe("scispacy_linker")

def mesh_extractor(text):
    doc = nlp(text)
    for e in doc.ents:
        if e._.kb_ents:
            cui = e._.kb_ents[0][0]
            print(e, cui)

text = "Give him three injection of paracetamol"

​Then when I use it:

>> mesh_extractor(text)
Give C1947971
injection C0021485

But, in the README of scispaCy, I see that for MeSH, it should not return UMLS CUIs, but the specific MeSH IDs (for example, D003435). How to fix this? Did I understand something badly?

dakinggg commented 3 years ago

ahh, the config parameter is called linker_name, not name. If you set linker_name instead, it should work.

xegulon commented 3 years ago

Thanks a lot!

Braianpp commented 1 year ago

I am getting the same error eve using linker_name in the configurator:

config = { "resolve_abbreviations": True,
"linker_name": "mesh", "max_entities_per_mention":5 }

nlp = spacy.load("en_core_sci_md")

nlp.add_pipe("scispacy_linker", config=config)

linker = nlp.get_pipe("scispacy_linker")

doc = nlp("Pre-diabetes Obesity Type-2 Diabetes Mellitus Obesity Overweight")

for e in doc.ents: if e._.kbents: cui = e..kb_ents[0][0] print(e, cui)

and I get:
Pre-diabetes C0362046 Obesity C0028754 Diabetes Mellitus C0011849 Obesity C0028754 Overweight C0497406

I also used other Scispacy model: nlp = spacy.load("en_ner_bionlp13cg_md") in the same script, I don't know if it matters

dakinggg commented 1 year ago

Hi, it looks like the original mesh linker was created with a separate kb, rather than just a subset of UMLS. The process for creating the linker may have been lost. When I recreated the linkers for the latest UMLS release, I just used a subset of UMLS to produce the mesh linker. I'll have to look into this and decide whether to just stick to the current UMLS ids, or try to recreate the old version of the linker. Sorry about that. For now you will need to map between UMLS id and mesh id yourself.

Braianpp commented 1 year ago

I see, maybe I will try using the previous scispacy version (0.5.1) that should work. Thank you very much for answering my question!

JohnGiorgi commented 1 year ago

Also facing this problem, but I am able to map to MeSH from UMLS CUIs using the MRCONSO.RRF file