clulab / eidos

Machine reading system for World Modelers
Apache License 2.0
36 stars 24 forks source link

Migration events need to be exported as JSONLD #594

Closed kwalcock closed 4 years ago

kwalcock commented 5 years ago

Here below are the forthcoming details...

kwalcock commented 5 years ago

It seems like the question marks in the bold line of the table below would need to be specified. Italics means that the values are optional.

Scala JSON-LD
Mention Extraction Argument
Class Trigger Arguments Label Type Subtype Type
TextBound - - N/A concept entity -
Event TextBound cause
effect
Causal relation causation source
destination
Event TextBound cause
effect
Correlation relation correlation argument
argument
Event TextBound group
moveTo
moveFrom
moveThrough
timeStart
timeEnd
time
HumanMigration relation? migration? group?
moveTo?
moveFrom?
moveThrough?
timeStart?
timeEnd?
time?
Relation - true N/A - - -
CrossSentence* - cause
effect
Coreference relation coreference anchor
reference
kwalcock commented 5 years ago

The above might be discussed by people like @BeckySharp, @zupon, and @MihaiSurdeanu.

MihaiSurdeanu commented 5 years ago

I agree with this spec.

Importantly: at some point (probably after Eidos) events with partial information that come from different sentences will have to be merged in a single Migration event. For example, one sentence might specify how many people were displaced. Another where they moved. Should this JSON output happen after the merge? It seems to me that the answer is yes. @BeckySharp?

kwalcock commented 5 years ago

Here's an early example:

   } ],
  "extractions" : [ {
    "@type" : "Extraction",
    "@id" : "_:Extraction_1",
    "type" : "concept",
    "subtype" : "entity",
    "labels" : [ "Time", "EntityModifier", "Event" ],
    "text" : "the beginning of September 2016",
    "rule" : "time-stanford",
    "canonicalName" : "the beginning of September 2016",
    "provenance" : [ {
      "@type" : "Provenance",
      "document" : {
        "@id" : "_:Document_1"
      },
      "documentCharPositions" : [ {
        "@type" : "Interval",
        "start" : 6,
        "end" : 36
      } ],
      "sentence" : {
        "@id" : "_:Sentence_1"
      },
      "sentenceWordPositions" : [ {
        "@type" : "Interval",
        "start" : 2,
        "end" : 6
      } ]
    } ]
  }, {
    "@type" : "Extraction",
    "@id" : "_:Extraction_2",
    "type" : "relation",
    "subtype" : "migration",
    "labels" : [ "HumanMigration", "Event" ],
    "text" : "the beginning of September 2016, almost 40,000 refugees arrived in Ethiopia from South Sudan as of mid-November",
    "rule" : "migration-verbs1",
    "canonicalName" : "the beginning of September 2016 refugee arrive Ethiopia South Sudan mid-November",
    "provenance" : [ {
      "@type" : "Provenance",
      "document" : {
        "@id" : "_:Document_1"
      },
      "documentCharPositions" : [ {
        "@type" : "Interval",
        "start" : 6,
        "end" : 116
      } ],
      "sentence" : {
        "@id" : "_:Sentence_1"
      },
      "sentenceWordPositions" : [ {
        "@type" : "Interval",
        "start" : 2,
        "end" : 19
      } ]
    } ],
    "trigger" : {
      "@type" : "Trigger",
      "text" : "arrived",
      "provenance" : [ {
        "@type" : "Provenance",
        "document" : {
          "@id" : "_:Document_1"
        },
        "documentCharPositions" : [ {
          "@type" : "Interval",
          "start" : 62,
          "end" : 68
        } ],
        "sentence" : {
          "@id" : "_:Sentence_1"
        },
        "sentenceWordPositions" : [ {
          "@type" : "Interval",
          "start" : 11,
          "end" : 11
        } ]
      } ]
    },
    "arguments" : [ {
      "@type" : "Argument",
      "type" : "moveTo",
      "value" : {
        "@id" : "_:Extraction_5"
      }
    }, {
      "@type" : "Argument",
      "type" : "moveFrom",
      "value" : {
        "@id" : "_:Extraction_6"
      }
    }, {
      "@type" : "Argument",
      "type" : "timeStart",
      "value" : {
        "@id" : "_:Extraction_1"
      }
    }, {
      "@type" : "Argument",
      "type" : "timeEnd",
      "value" : {
        "@id" : "_:Extraction_7"
      }
    } ]
  }, {
    "@type" : "Extraction",
    "@id" : "_:Extraction_3",
    "type" : "concept",
    "subtype" : "entity",
    "labels" : [ "Concept", "Entity" ],
    "text" : "almost 40,000 refugees",
    "rule" : "simple-np",
    "canonicalName" : "refugee",
    "provenance" : [ {
      "@type" : "Provenance",
      "document" : {
        "@id" : "_:Document_1"
      },
      "documentCharPositions" : [ {
        "@type" : "Interval",
        "start" : 39,
        "end" : 60
      } ],
      "sentence" : {
        "@id" : "_:Sentence_1"
      },
      "sentenceWordPositions" : [ {
        "@type" : "Interval",
        "start" : 8,
        "end" : 10
      } ]
    } ]
  }, {
    "@type" : "Extraction",
    "@id" : "_:Extraction_4",
    "type" : "concept",
    "subtype" : "entity",
    "labels" : [ "Concept", "Entity" ],
    "text" : "arrived",
    "rule" : "simple-vp",
    "canonicalName" : "arrive",
    "provenance" : [ {
      "@type" : "Provenance",
      "document" : {
        "@id" : "_:Document_1"
      },
      "documentCharPositions" : [ {
        "@type" : "Interval",
        "start" : 62,
        "end" : 68
      } ],
      "sentence" : {
        "@id" : "_:Sentence_1"
      },
      "sentenceWordPositions" : [ {
        "@type" : "Interval",
        "start" : 11,
        "end" : 11
      } ]
    } ],
    "states" : [ {
      "@type" : "State",
      "type" : "LocationExp",
      "text" : "Ethiopia",
      "value" : {
        "@id" : "_:GeoLocation_1"
      }
    }, {
      "@type" : "State",
      "type" : "LocationExp",
      "text" : "South Sudan",
      "value" : {
        "@id" : "_:GeoLocation_2"
      }
    } ]
  }, {
    "@type" : "Extraction",
    "@id" : "_:Extraction_5",
    "type" : "concept",
    "subtype" : "entity",
    "labels" : [ "Location", "EntityModifier", "Event" ],
    "text" : "Ethiopia",
    "rule" : "location-nn",
    "canonicalName" : "Ethiopia",
    "provenance" : [ {
      "@type" : "Provenance",
      "document" : {
        "@id" : "_:Document_1"
      },
      "documentCharPositions" : [ {
        "@type" : "Interval",
        "start" : 73,
        "end" : 80
      } ],
      "sentence" : {
        "@id" : "_:Sentence_1"
      },
      "sentenceWordPositions" : [ {
        "@type" : "Interval",
        "start" : 13,
        "end" : 13
      } ]
    } ]
  }, {
    "@type" : "Extraction",
    "@id" : "_:Extraction_6",
    "type" : "concept",
    "subtype" : "entity",
    "labels" : [ "Location", "EntityModifier", "Event" ],
    "text" : "South Sudan",
    "rule" : "location-nn",
    "canonicalName" : "South Sudan",
    "provenance" : [ {
      "@type" : "Provenance",
      "document" : {
        "@id" : "_:Document_1"
      },
      "documentCharPositions" : [ {
        "@type" : "Interval",
        "start" : 87,
        "end" : 97
      } ],
      "sentence" : {
        "@id" : "_:Sentence_1"
      },
      "sentenceWordPositions" : [ {
        "@type" : "Interval",
        "start" : 15,
        "end" : 16
      } ]
    } ]
  }, {
    "@type" : "Extraction",
    "@id" : "_:Extraction_7",
    "type" : "concept",
    "subtype" : "entity",
    "labels" : [ "Time", "EntityModifier", "Event" ],
    "text" : "mid-November",
    "rule" : "time-stanford",
    "canonicalName" : "mid-November",
    "provenance" : [ {
      "@type" : "Provenance",
      "document" : {
        "@id" : "_:Document_1"
      },
      "documentCharPositions" : [ {
        "@type" : "Interval",
        "start" : 105,
        "end" : 116
      } ],
      "sentence" : {
        "@id" : "_:Sentence_1"
      },
      "sentenceWordPositions" : [ {
        "@type" : "Interval",
        "start" : 19,
        "end" : 19
      } ]
    } ]
  } ]
}
BeckySharp commented 5 years ago

I agree with this spec.

Importantly: at some point (probably after Eidos) events with partial information that come from different sentences will have to be merged in a single Migration event. For example, one sentence might specify how many people were displaced. Another where they moved. Should this JSON output happen after the merge? It seems to me that the answer is yes. @BeckySharp?

yes -- after the "merge" (i.e., after we aggregate the pieces).
FYI - assuming we can successfully run the time normalization code, we will likely only have a single time argument, and some of the args will be optional, even if we output after aggregation. @MihaiSurdeanu I guess we need to decide how incomplete an event can be before we prune it?

MihaiSurdeanu commented 5 years ago

@BeckySharp: I suggest that we keep events (after the merge) that have at least the group, and (at least one of?) moveFrom and moveTo. What do you think?

@kwalcock: the locations and times do not seem to include normalization info in the JSON. This mean that part has not been integrated yet? Just asking.

all: will we be able to view these events in the web demo?

kwalcock commented 5 years ago

I think the answer is no, they haven't been integrated yet. I was just working with what was available at the time.

BeckySharp commented 5 years ago

@MihaiSurdeanu I agree! re: the web demo, do you mean the webapp at the hackathon? if so, yes, I think we already can on @zupon 's branch. I can add some color stuff if it helps. If you mean something more, please let me know. Also, IDK what amt of "aggregation" will be ready before the hackathon, but hopefully @zupon can have a proof of concept ready or maybe planned?

Thoughts?

MihaiSurdeanu commented 5 years ago

This vanilla web app would be great! No customization needed. Can you please let me know which branch should I use, and any special instructions I need?

We can work on the aggregation during the hackathon, but it would be great if we had the skeleton, i.e., the API, connected into Eidos before.

BeckySharp commented 5 years ago

cool -- as I recall we added the hook/stub for the API (not in the postProcessorStep way bc that half of the refactor hasn't happened)

@zupon can you please show @MihaiSurdeanu the branch, suggest a query or 2 that work well in the webapp to show the migration events, and point him to the stub for aggregating (if I am remembering correctly that we did it...) -- meaning you don't already need it implemented, btw

thanks!

zupon commented 5 years ago

@MihaiSurdeanu the main branch to check out the migration stuff is migration_schema-zupon. That's where I have been writing the rules to capture the events.

Here are a couple of queries that demonstrate the kinds of things we can get so far. The first one is a really good example. The second one is much more complicated and shows that we still have some work to do.

@maxaalexeeva and I have started working on making the attachments for Geolocations. We've prototyped moveTo and can probably easily add the other locations and possibly even the times before the hackathon. We have not yet done anything for cross-sentence aggregation though. You can see our prototype for moveTo here: https://github.com/clulab/eidos/blob/migration_schema-Masha/src/main/scala/org/clulab/wm/eidos/actions/MigrationUtils.scala

maxaalexeeva commented 5 years ago

@MihaiSurdeanu in the migration_schema-Masha branch (the link Andrew sent), the method CopyWithNewArgs is something we found elsewhere and will either need to be imported in some nicer way or moved to a better place. The processMigrationEvents method is similar to this action: https://github.com/clulab/eidos/blob/830bc1d63dc572323a0bc8b04f28f6f82621740f/src/main/scala/org/clulab/wm/eidos/EidosActions.scala#L339

Does this output look like what we need? Screenshot from 2019-05-31 16-18-31

MihaiSurdeanu commented 5 years ago

Thanks @zupon and @maxaalexeeva!

This looks good. But why do you need the method copyWithNewArgs here? Also, what do I run to get this text output?

maxaalexeeva commented 5 years ago

@MihaiSurdeanu that's from running sentences in the webapp. The method is there because we didn't know a better way to add an attachment to an argument within an eventMention.

MihaiSurdeanu commented 5 years ago

Thanks! I am playing with the grammar now. @zupon: please add unit tests for the all the sentences that are currently covered by grammar rules. @kwalcock: is there a testing framework for these custom events? Thanks!

kwalcock commented 5 years ago

PR #593, which doesn't yet seem to have been merged into Andrew's branch, includes a testing framework and tests all the related grammar rules available at the time. It's in the file TestMigrationSchema.scala. @zupon, have you had the chance to check it out? I've been assuming that github notifies you of the request on your branch.

MihaiSurdeanu commented 5 years ago

Thanks @kwalcock !

zupon commented 5 years ago

Hi all,

I haven't seen anything from GitHub about it, so I haven't merged anything. I will have time to check it out later today.

AZ

On Thu, Jun 6, 2019, 13:39 Mihai Surdeanu notifications@github.com wrote:

Thanks @kwalcock https://github.com/kwalcock !

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/clulab/eidos/issues/594?email_source=notifications&email_token=AHMVORXHORFGLNQGVHNT6YTPZFKWJA5CNFSM4HR4OCJKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXDYZMI#issuecomment-499616945, or mute the thread https://github.com/notifications/unsubscribe-auth/AHMVORTJFHFRMJXYSHX4X2LPZFKWJANCNFSM4HR4OCJA .

MihaiSurdeanu commented 5 years ago

Thanks @zupon! This is important, because we will communicate through these unit tests during the hackathon and beyond.

zupon commented 5 years ago

For some reason I am not getting the github notifications about requests on my branch, so manually mentioning me in a comment will be a good way to get my attention until I get that resolved.

I noticed that for this PR there are failing tests, but they don't seem to be related to the migration tests that @kwalcock wrote for me. Does that sound right? Is it alright to merge the PR with the failing tests into my branch?

Also, is there a command to just run the migration-related tests, rather than running everything together?

MihaiSurdeanu commented 5 years ago

Yes, please merge. The failures do not come from you.

On June 6, 2019 at 3:02:19 PM, zupon (notifications@github.com) wrote:

For some reason I am not getting the github notifications about requests on my branch, so manually mentioning me in a comment will be a good way to get my attention until I get that resolved.

I noticed that for this PR there are failing tests, but they don't seem to be related to the migration tests that @kwalcock https://github.com/kwalcock wrote for me. Does that sound right? Is it alright to merge the PR with the failing tests into my branch?

Also, is there a command to just run the migration-related tests, rather than running everything together?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/clulab/eidos/issues/594?email_source=notifications&email_token=AAI75TW67NOFJZ7EZURJXF3PZGCOVA5CNFSM4HR4OCJKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXEJJRI#issuecomment-499684549, or mute the thread https://github.com/notifications/unsubscribe-auth/AAI75TW4Y7BHRIMOKDBTRMDPZGCOVANCNFSM4HR4OCJA .

kwalcock commented 5 years ago

I usually do it in IntelliJ with some clicks, but it is possible in sbt with

testOnly org.clulab.wm.eidos.text.english.TestMigrationSchema
zupon commented 5 years ago

Thanks @kwalcock ! And thanks also for writing the tests!

MihaiSurdeanu commented 5 years ago

@zupon: can you please unit tests for covered sentences in a new file, e.g., TestMigrationEvents?

zupon commented 5 years ago

@MihaiSurdeanu Can you please clarify what you mean? Do you want me to move the existing tests to a different file/location, add new tests to a new file, or something else?

MihaiSurdeanu commented 5 years ago

@zupon: nm. I forgot to pull. It's all good. Thanks all!

maxaalexeeva commented 5 years ago

@MihaiSurdeanu, a follow up on attachments:

Here (Masha's branch with Andrew's merged): https://github.com/clulab/eidos/blob/8861627c89c6c4232d685228a93aa6c3be690a17/src/main/scala/org/clulab/wm/eidos/actions/MigrationUtils.scala#L14 we try to make attachments for the args in the human migration events. Here's the output we get in the webapp for this type of event:

Image 1 Screenshot from 2019-06-06 20-40-50

However, among the concepts, we see output like this (time attached to groups and actions):

Image 2 Screenshot from 2019-06-06 20-41-08

The time attachments seem to be originating from here: https://github.com/clulab/eidos/blob/830bc1d63dc572323a0bc8b04f28f6f82621740f/src/main/scala/org/clulab/wm/eidos/EidosActions.scala#L326

Questions:

1) should geolocations be attached to location mentions (like in image 1) or to some other arguments in the migration event?

2) do the time attachments look right based on image 2?

3) is there a way to check if we ended up attaching the right thing? In the webapp output, it does not tell us that it attached geolocations to mentions---it just says that the type of attachment is location.

Here's the sentence typed up in case anyone want to try running the code (Note: name of the branch where we are trying to make attachments is migration_schema-Masha): Since the beginning of September 2016, almost 40,000 refugees arrived in Ethiopia from South Sudan as of mid-November.

MihaiSurdeanu commented 5 years ago

@maxaalexeeva:

  1. I think attachments should be attached directly to location mentions. @BeckySharp, what do you think?
  2. The span of the time entity seems fine, but I can't tell if it's been normalized. Has it?
  3. I think the web app should be expanded to show this. @BeckySharp, who should do it?