Open eric-czech opened 2 years ago
Hey Eric, thanks for the pointer to Gilda! I'm not familiar with it, but will have a look at some point. Id like for scispacy to have a trained entity linker (that compares the textual context with the entity definitions) on top of the entity linking (more of a candidate generator) that we added a while ago, but it's not something I've had time to do. I also haven't looked at spacys native entity linker, but that would be the natural place to start. If someone wanted to propose an integration with an existing entity linking library with some evaluation, I'd be happy to have a look!
Thanks @dakinggg. We may do some experiments to that end at some point.
Feel free to close this out if you'd like.
Silly question, but are there any "plug and play" options for using other models for the Linking? I need the UMLS entities, and scispacy is the only tool I know of that supports it. I don't know how to train a new model from scratch. I would love to use another, "stronger" linker +- NER model (e.g. ML based) though, if it can be plugged into the pipeline
I was wondering if you all had any thoughts on developments in the python ecosystem for entity linking since the original 2019 ScispaCy paper. I see a number of comparisons in it to MetaMap, some of the early code on adding native entity linking (https://github.com/allenai/scispacy/pull/72, https://github.com/allenai/scispacy/pull/88), and then lots of more recent requests for extensions to linking like https://github.com/allenai/scispacy/issues/428, https://github.com/allenai/scispacy/issues/346, and https://github.com/allenai/scispacy/issues/331 which all got me curious as to whether or not you all are aware of or have potentially considered other methods/libraries for NEN. Gilda (paper) is one we've been experimenting with since its dependencies are relatively light and its offline execution time is definitely good enough for high-throughput NER + NEN pipelines. The sources/ontologies covered by it include a number that make a useful complement to what's supported in ScispaCy:
I'm not really advocating for it, and I can't speak to the validity of the methodological differences between it and what you all have done, but I am curious if you all are watching other projects like this and considering either integrating them or pointing users at them if they are looking for support over more nomenclatures/ontologies.
Thanks in advance for any insight here and for all the work you've done on this library!