amrisi / amr-guidelines

239 stars 86 forks source link

ms-amr #228

Open timjogorman opened 6 years ago

timjogorman commented 6 years ago

MS-AMR format and release

I've got most of the MS-AMR release ready -- all completed files have been checked against the latest snapshot from Ulf, and checked for a range of issues - overlapping identity chains, changes in what they refer to, and even checking against the speaker IDs I've extracted from the ERE data to make sure that chains with "i" are consistent. If anyone has ideas for additional things to test, let me know!

I've had a format I've been using for a few months, but have been trying to hammer out an easy, interpretable format for this. A given document would have a simple name like "msamr-dfb-023.gold.xml", and have two sections. The first would be a decaration of what the "document" is -- a list of the AMRs in a document, and the speaker and post IDs when available:

   <sentences annotator="anno7" docid="408dff173c599256711f23238e280c15" end="p53" site="LDC" sourcetype="DEFT" start="p49" threadid="bolt-eng-DF-200-192448-6191965">
      <amr id="bolt-eng-DF-200-192448-6191965_0049.1" order="0" post="p49" speaker="jb9191" su="1"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0049.2" order="1" post="p49" speaker="jb9191" su="2"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0049.3" order="2" post="p49" speaker="jb9191" su="3"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0049.4" order="3" post="p49" speaker="jb9191" su="4"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0049.5" order="4" post="p49" speaker="jb9191" su="5"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0049.6" order="5" post="p49" speaker="jb9191" su="6"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0049.7" order="6" post="p49" speaker="jb9191" su="7"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0049.8" order="7" post="p49" speaker="jb9191" su="8"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0049.9" order="8" post="p49" speaker="jb9191" su="9"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0049.10" order="9" post="p49" speaker="jb9191" su="10"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0049.11" order="10" post="p49" speaker="jb9191" su="11"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0049.12" order="11" post="p49" speaker="jb9191" su="12"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0049.13" order="12" post="p49" speaker="hollyone" su="13"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0049.14" order="13" post="p49" speaker="hollyone" su="14"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0050.1" order="14" post="p50" speaker="RNBen" su="15"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0050.3" order="15" post="p50" speaker="xnatalie01x" su="17"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0050.4" order="16" post="p50" speaker="xnatalie01x" su="18"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0050.5" order="17" post="p50" speaker="xnatalie01x" su="19"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0051.1" order="18" post="p51" speaker="Huskaris" su="21"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0051.2" order="19" post="p51" speaker="Huskaris" su="21"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0051.3" order="20" post="p51" speaker="Huskaris" su="22"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0051.4" order="21" post="p51" speaker="Huskaris" su="23"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0051.5" order="22" post="p51" speaker="Huskaris" su="24"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0051.6" order="23" post="p51" speaker="Huskaris" su="25"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0051.7" order="24" post="p51" speaker="Huskaris" su="26"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0052.1" order="25" post="p52" speaker="NeoNerd" su="27"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0052.2" order="26" post="p52" speaker="ed46" su="28"/>
      <amr id="bolt-eng-DF-200-192448-6191965_0053.1" order="27" post="p53" speaker="Arielle" su="29"/>
   </sentences>

Then the identity chains are just explicitly marked as links between variables in each AMR document:

   <relations>
      <identity>
         <identchain relationid="rel-0">
            <mention concept="government-organization" id="bolt-eng-DF-200-192448-6191965_0049.2" variable="g">Protection_Command</mention>
            <implicitrole argument="ARG0" id="bolt-eng-DF-200-192448-6191965_0049.3" parentconcept="take-01" parentvariable="t"/>
         </identchain>
         <identchain relationid="rel-1">
            <mention concept="person" id="bolt-eng-DF-200-192448-6191965_0049.10" variable="p2"/>
            <mention concept="they" id="bolt-eng-DF-200-192448-6191965_0051.1" variable="t6"/>
            <mention concept="person" id="bolt-eng-DF-200-192448-6191965_0049.4" variable="p"/>
            <mention concept="they" id="bolt-eng-DF-200-192448-6191965_0051.7" variable="t2"/>
            <mention concept="person" id="bolt-eng-DF-200-192448-6191965_0051.2" variable="p"/>
            <mention concept="person" id="bolt-eng-DF-200-192448-6191965_0049.11" variable="p"/>
            <mention concept="person" id="bolt-eng-DF-200-192448-6191965_0049.5" variable="p"/>
            <mention concept="person" id="bolt-eng-DF-200-192448-6191965_0049.12" variable="p"/>
            <mention concept="person" id="bolt-eng-DF-200-192448-6191965_0049.3" variable="p"/>
            <mention concept="person" id="bolt-eng-DF-200-192448-6191965_0051.6" variable="p"/>
            <implicitrole argument="ARG0" id="bolt-eng-DF-200-192448-6191965_0052.2" parentconcept="attack-01" parentvariable="a"/>
         </identchain>
         <identchain relationid="rel-2">
            <mention concept="hole" id="bolt-eng-DF-200-192448-6191965_0049.10" variable="h"/>
            <mention concept="country" id="bolt-eng-DF-200-192448-6191965_0049.9" variable="c"/>
         </identchain>
         <identchain relationid="rel-3">
            <mention concept="person" id="bolt-eng-DF-200-192448-6191965_0049.7" variable="p"/>
            <mention concept="police" id="bolt-eng-DF-200-192448-6191965_0049.6" variable="p"/>
            <mention concept="they" id="bolt-eng-DF-200-192448-6191965_0049.8" variable="t2"/>
         </identchain>
         <identchain relationid="rel-4">
            <mention concept="i" id="bolt-eng-DF-200-192448-6191965_0049.11" variable="i"/>
            <mention concept="i" id="bolt-eng-DF-200-192448-6191965_0049.5" variable="i"/>
            <mention concept="i" id="bolt-eng-DF-200-192448-6191965_0049.10" variable="i"/>
            <mention concept="i" id="bolt-eng-DF-200-192448-6191965_0049.7" variable="i"/>
            <mention concept="i" id="bolt-eng-DF-200-192448-6191965_0049.6" variable="i"/>
            <mention concept="i" id="bolt-eng-DF-200-192448-6191965_0049.12" variable="i"/>
         </identchain>
         <identchain relationid="rel-5">
            <mention concept="way" id="bolt-eng-DF-200-192448-6191965_0049.3" variable="w"/>
            <mention concept="route" id="bolt-eng-DF-200-192448-6191965_0049.2" variable="r2"/>
         </identchain>
         <identchain relationid="rel-6">
            <mention concept="person" id="bolt-eng-DF-200-192448-6191965_0049.2" variable="p3">Camilla_Duchess_of_Cornwall</mention>
            <mention concept="person" id="bolt-eng-DF-200-192448-6191965_0049.13" variable="p">Camilla_Duchess_of_Cornwall</mention>
            <mention concept="she" id="bolt-eng-DF-200-192448-6191965_0049.14" variable="s"/>
         </identchain>
         <identchain relationid="rel-7">
            <mention concept="person" id="bolt-eng-DF-200-192448-6191965_0051.3" variable="p"/>
            <implicitrole argument="ARG0" id="bolt-eng-DF-200-192448-6191965_0051.4" parentconcept="avoid-01" parentvariable="a"/>
            <implicitrole argument="ARG0" id="bolt-eng-DF-200-192448-6191965_0051.5" parentconcept="get-04" parentvariable="g"/>
         </identchain>
         <identchain relationid="rel-8">
            <mention concept="they" id="bolt-eng-DF-200-192448-6191965_0049.3" variable="t2"/>
            <mention concept="they" id="bolt-eng-DF-200-192448-6191965_0053.1" variable="t2"/>
         </identchain>
         <identchain relationid="rel-9">
            <mention concept="i" id="bolt-eng-DF-200-192448-6191965_0051.1" variable="i"/>
            <mention concept="i" id="bolt-eng-DF-200-192448-6191965_0051.7" variable="i"/            <mention concept="i" id="bolt-eng-DF-200-192448-6191965_0051.2" variable="i"/>
         </identchain>
      </identity>

Finally, we can encode set/member and part/whole relations, and any AMR variables they refer to that aren't in the coreference chains:

      <singletons>
         <identchain relationid="singleton-10">
            <mention concept="idiot" id="bolt-eng-DF-200-192448-6191965_0051.3" variable="i"/>
         </identchain>
         <identchain relationid="singleton-12">
            <mention concept="person" id="bolt-eng-DF-200-192448-6191965_0049.2" variable="p">Charles_Prince_of_Wales</mention>
         </identchain>
      </singletons>
      <bridging>
         <setmember relationid="rel-11">
            <superset id="rel-1"/>
            <member id="rel-7"/>
            <member id="singleton-10"/>
         </setmember>
         <setmember relationid="rel-13">
            <superset id="rel-8"/>
            <member id="singleton-12"/>
            <member id="rel-6"/>
         </setmember>
      </bridging>
   </relations>

Any suggestions? We want this to feel as obvious and as easy to understand as possible. Some questions:

Current status: set documents amrs WB 16 812 DF(LDC) 62 2689 DFB(UCO) 49 2056 DFA(UCO) 139 2163 total 266 7720