Closed e3oroush closed 2 years ago
Very cool, thanks for doing this! I'll take a look over the weekend.
I found a problem regarding version conflict of spaCy and allennlp. If one simply perform the pip install -r requirements.txt
wouldn't work, because spaCy 3 and allennlp 1.1 have version conflict. But I was able to install spaCy==3.0.0 after running pip install -r requirments.txt
, I will try to fix this issue over the weekend.
My last commit should resolve the problem.
Thanks! The next two weeks are a bit hectic, but I'll review this after Dec 17 and let you know if I've got any questions.
I just gave your Jupyter notebook a try, and it worked. Very cool! I'll merge. I'll also add something about this in the README. Is it OK if I instruct people to @
you for issues involving the spacy interface?
Also - if you have time, could you submit a PR with a test confirming that the Spacy outputs match the outputs from the conventional DyGIE inference pipeline (modulo nested entities, which you've removed)?
For addressing me, yes please do so, I would gladly help.
For the test file, I'm not sure if I got your point correctly, what did you mean by modulo nested entities
? Where did I remove what? :smile:
OK, I'll update the README.
Haha sorry, I wasn't super-clear. I'm referring to this comment in your PR, where it appears that you remove overlapping entities. Because of this, I assume the spacy outputs won't always be identical to the outputs of the original DyGIE inference?
Aha, yes exactly. Because you are using span-based entity detection, it doesn't work natively with named entity module of spaCy. I will write a unit test to show this issue.
One quick question about your opinion, do you think if it's worth to define the detected entities as spaCy spans and store them in the Document
object (e.g. doc._.span_ents
) as the list of span based entities? Although it wouldn't be the official spaCy entity anymore, but we don't have to remove some of the detected entities. :smiley:
Something like this:
Hmm I don't have a strong opinion on storing them as spans vs. entities. How about both ways, if it's not too much more work?
No I don't think so, it just didn't come to me when I was developing it. I will try to add them as well as the unit test, over the weekend.
Hi I wanted to have a simple, easy-to-use interface outside of the allennlp CLI for this brilliant library available.
So I developed this PR that creates a new pipe line to the spaCy library and gives you all the extracted entities and relations inside spacy output document.
For usage example, there is a notebook to see how it can be used and the final outcome looks like.