dwadden / dygiepp

Span-based system for named entity, relation, and event extraction.
MIT License
571 stars 120 forks source link

Apply on Roles & Triggers across sentences. #38

Closed jeremytanjianle closed 4 years ago

jeremytanjianle commented 4 years ago

Hi, I'd like to apply DyGIE++ on the Roles Across Multiple Sentences (RAMS) dataset.

In the RAMS dataset, the event triggers and arguments may be in separate sentences. For example, the trigger could be in sentence 3, but the victim and killer is on sentence 4.

But looking at data.md, it seems like the data format is required to have the trigger and arguments in the same sentence. Is DyGIE++ capable to processing event extraction across sentences?

dwadden commented 4 years ago

Looks like a cool dataset.

You're correct, DyGIE can only handle within-sentence events. Simplest way around this: just treat your entire document as a single sentence. As a simple example, instead of this:

{"sentences": [["Here's", "a", "sentence", "."], ["Here's", "another"]]}

do this:

{"sentences": [["Here's", "a", "sentence", ".", "Here's", "another"]]}

The issue you'll run into here is that DyGIE makes event predictions by:

The number of token / span pairs scales as O(n^3), where n is sentence length. This gets bad quickly. To deal with this, you can modify the config:

It will be easier to work with the AllenNLP-V1 branch. There's info on how to modify these elements of the config here: https://github.com/dwadden/dygiepp/blob/allennlp-v1/doc/config.md#changing-arbitrary-parts-of-the-template.

Let me know if this doesn't work.

jeremytanjianle commented 4 years ago

Thanks very much, this is helpful.

Will take some time to test it out thoroughly, so I'll just close this for now.

dwadden commented 4 years ago

OK, sounds good. If this doesn't work for you, let me know and we can try some other approach.