Closed ldorigo closed 4 years ago
Hey -- I haven't looked at spacy pipelines, and I feel this would be a big effort to integrate. To be honest, I don't think the dependency on spacy is very crucial for this package, as it is only used for POS tagging; if anything, I think it would be good to move away for it to some more lightweight parser and enhance throughput.
Actually, I've given a look into it (I am using both QuickUMLS and Spacy for a project and I needed both to interoperate), and it's much easier than I thought. The work is almost done, I need to polish it a bit and will submit a pull request later today. Then you can see if you think it's relevant to add it :-)
Regarding performance, from the profiling I've done, more than 99% of the time is spent on entity matching, so I don't think spacy introduces any noticeable delay.
Sounds good -- thank you for experimenting with this! Looking forward to it.
I have some code working for this so that QuickUMLS is a custom component in a spaCy pipeline. I'll try to set up a pull request this week.
Any news from this issue ? Thanks
Hi @ljak!
I do not have any bandwidth to support this work, but I'd be more than happy to review and merge contributions from either @burgersmoke or @ldorigo.
-Luca
Same for me! It's been a long time already, that's why I'm asking. If it is something that needs to be done, I could help as well.
OK, this is close. I've been working on cleaning this up tonight so that it can work within QuickUMLS 1.4. I'm hoping I can have a pull request by the end of the weekend.
OK. I have a very early pull request ready. There are some arguments hard-coded which could be easily fixed, but this should give the overall idea so that this can be used as a component in a pipeline rather than as its own pipeline.
I've been using this in syndromic surveillance for the last 1.5+ years with a preprocessor used to normalize chief complaint text (e.g. "n/v/d c/o cp") into more standard lexical forms before this gets passed to QuickUMLS. We've even been experimenting with other components which run after QuickUMLS. We've got great visions, but I'm sorry it's taken so long.
I wish I could have done this earlier, but since I work in syndromic surveillance, I hope you will forgive me that COVID-19 got in the way before prepping this pull request.
I welcome all feedback as I know this can be improved.
@ldorigo what do you think about the pull request that I put here? Is this close to your thinking? There are many ways to potentially do this, so I am open to any thoughts.
@burgersmoke,
This is awesome, thank you for putting the work! I'll set time aside to look at the PR during the weekend :)
-Luca
I had a chance to review changes, and left a couple of minor comments. Otherwise, it looks awesome!
Hi there.
Since you already use Spacy internally, I am wondering if there would be some good way to make the named entity resolver into a Spacy pipeline component?
It would make it much easier to integrate it with other NLP tasks.
I am currently looking into it but I'm very new to Spacy, so if you have suggestions, please let me know.