dwadden / dygiepp

Span-based system for named entity, relation, and event extraction.
MIT License
569 stars 120 forks source link

Can the model be applied to cross-sentence relation extraction? #83

Closed JHLiu7 closed 2 years ago

JHLiu7 commented 2 years ago

Hi there, thanks for the great work on this project! I read that the paper mentions that the data sets dygie++ examined all contain relations within the same sentence. I'm wondering if that's just the characteristic of the data or if it means dygie++ is not specifically designed for cross-sentence RE. If I'd like to examine a dataset that contains many cross-sentence relations (i.e., in a long scientific document), what would be the best way for me to approach it with the repo? Would simply proceeding with the data formatting, as described in the repo, be sufficient? Thanks

dwadden commented 2 years ago

Hi,

The datasets I experimented on in the paper only have within-sentence relations (or I pre-processed them to only have in-sentence relations).

In principle you can run DyGIE to extract cross-sentence relations. The issue is just the runtime. DyGIE predicts relations by enumerating all pairs of spans up to a given length, which scales like O(n^4), where n is the length of the relevant context. If n is the length of a sentence this is OK, but if it's a full document that gets pretty big.

So, you can definitely try it, but if you find that either (1) you run out of GPU memory or (2) it takes look long to run that's probably what's happening. If you run into this I can try to think about how you could modify the code to make it work. Let me know.

I haven't kept up with the literature on full-document relation extraction, but here's a couple references from a year or two ago that might be helpful:

JHLiu7 commented 2 years ago

Thanks so much for the quick and detailed reply! I'll try with my dataset and see how it goes by rendering the whole document as a sentence. Will also check out the references. Closing the issue now