Closed jeremytanjianle closed 4 years ago
Looks like a cool dataset.
You're correct, DyGIE can only handle within-sentence events. Simplest way around this: just treat your entire document as a single sentence. As a simple example, instead of this:
{"sentences": [["Here's", "a", "sentence", "."], ["Here's", "another"]]}
do this:
{"sentences": [["Here's", "a", "sentence", ".", "Here's", "another"]]}
The issue you'll run into here is that DyGIE makes event predictions by:
The number of token / span pairs scales as O(n^3), where n is sentence length. This gets bad quickly. To deal with this, you can modify the config:
trigger_spans_per_word
and argument_spans_per_word
here: https://github.com/dwadden/dygiepp/blob/allennlp-v1/training_config/template.libsonnet#L99. These specify the number of trigger and argument candidates to generate, as a fraction of the number of words in the sentence (longer sentences get more candidates).max_span_width
here: https://github.com/dwadden/dygiepp/blob/allennlp-v1/training_config/template.libsonnet#L32. This also reduces the number of spans.It will be easier to work with the AllenNLP-V1
branch. There's info on how to modify these elements of the config here: https://github.com/dwadden/dygiepp/blob/allennlp-v1/doc/config.md#changing-arbitrary-parts-of-the-template.
Let me know if this doesn't work.
Thanks very much, this is helpful.
Will take some time to test it out thoroughly, so I'll just close this for now.
OK, sounds good. If this doesn't work for you, let me know and we can try some other approach.
Hi, I'd like to apply DyGIE++ on the Roles Across Multiple Sentences (RAMS) dataset.
In the RAMS dataset, the event triggers and arguments may be in separate sentences. For example, the trigger could be in sentence 3, but the victim and killer is on sentence 4.
But looking at data.md, it seems like the data format is required to have the trigger and arguments in the same sentence. Is DyGIE++ capable to processing event extraction across sentences?