Closed almogmor closed 1 year ago
Hi, there are two components related to entity recognition and linking in scispacy. One is the Named Entity Recognition (NER) component, which identifies textual spans that are likely to be entities (and depending on which scipsacy model, also their broad type). This information can be accessed as you've done via doc.ents
and doc.ents[0].ent_type_
. The second is the Entity Linking component, which is the one you specify mesh/hpo for. That component takes in the textual spans selected by the NER component and attempts to link them to an entity from the knowledge base. That information can be accessed via doc.ents[0]._.kb_ents
. Hope that helps!
Thanks for the quick response, yes it does help. I see now that the Entities linking are different
But I couldn't find a way to map back from id e.g ('C0346073') to the name of the entity at the knowledge base ('mesh'/'hpo')
I have a similar question. In the above example itself, in spite of using hpo
as the linker, the id returned is C0346073
instead of HP:0012329
as we'd expect from the mapping shown here. I tried go
as well and yet same result. Am I missing something?
All of the ontology options are implemented as subsets of UMLS. We don't have any cross mapping to the root ontology identifier. You would have to get that from UMLS or another source. The entity information available from UMLS in scispacy can be accessed as in the example code
linker = nlp.get_pipe("scispacy_linker")
for umls_ent in entity._.kb_ents:
print(linker.kb.cui_to_entity[umls_ent[0]])
Then how do linkers like hpo
and go
change the output?
They link to subsets of UMLS that are more specific than the full UMLS. This can be useful for two reasons (at least two that come to mind) if you know that you just want entities that fall into one of those subsets, 1) the downloaded file is much smaller and memory usage is less 2) the results will be higher precision because you won't get links to entities of a different type that you are not interested in.
Is there any way to map back 'mesh' or 'hpo' linkers back the to relevant UMLS Entities ? In other words, if I'm using the umls linker can I filter which are 'mesh' related and which are 'hpo' related ?
e.g.
The mesh and hpo linker entities should contain the exact same information as the umls linker entities since they are just a subset.
Hi, I'm trying to annotate data using Scispacy. Loading "mesh" and "hpo" gives the exact same results no matter what is the input. For example:
I tried on many texts and both linkers plotted the same results.