danielhers / tupa

Transition-based UCCA Parser
https://danielhers.github.io/tupa
GNU General Public License v3.0
72 stars 24 forks source link

Why it produce two different trees on the same sentence? #74

Closed CarolLi closed 5 years ago

CarolLi commented 5 years ago

There exist a strange situation that certain sentences would have different trees between the online tupa and the one installed (TUPA version 1.3.9) by pip. It is very confusing, which one should I follow? Here is an example of a sentence,

Pence is making the right call.

Tree produces by the online TUPA:

<root annotationID="0" passageID="1_0">
  <attributes />
  <extra format="ucca" />
  <layer layerID="0">
    <attributes />
    <extra doc="[[[8548371584360429291, 8548371584360429291, 15308085513773655218, 92, 0, 2, 429, 2, 13110060611322374290, 8861071527689086543, 14431812100313486463], [3411606890003347522, 10382539506755952630, 13927759927860985106, 100, 0, 2, 405, 1, 4370460163704169311, 5097672513440128799, 3411606890003347522], [200141916689428108, 9614445426764226664, 1534113631682161808, 100, 0, 2, 8206900633647566924, 0, 13110060611322374290, 646772771845179972, 7679303661980345986], [7425985699627899538, 7425985699627899538, 15267657372422890137, 90, 0, 2, 415, 2, 4088098365541558500, 15369245168918225700, 7425985699627899538], [5943797630011647483, 5943797630011647483, 10554686591937588953, 84, 0, 2, 402, 1, 13110060611322374290, 16562859848569467201, 9602547604334134776], [14229572451745258962, 14229572451745258962, 15308085513773655218, 92, 0, 2, 416, -3, 13110060611322374290, 16900879642891266615, 13409319323822384369], [12646065887601541794, 12646065887601541794, 12646065887601541794, 97, 0, 2, 445, -4, 12646065887601541794, 12646065887601541794, 12646065887601541794]]]" />
    <node ID="0.1" type="Word">
      <attributes paragraph="1" paragraph_position="1" text="pence" />
    </node>
    <node ID="0.2" type="Word">
      <attributes paragraph="1" paragraph_position="2" text="is" />
    </node>
    <node ID="0.3" type="Word">
      <attributes paragraph="1" paragraph_position="3" text="making" />
    </node>
    <node ID="0.4" type="Word">
      <attributes paragraph="1" paragraph_position="4" text="the" />
    </node>
    <node ID="0.5" type="Word">
      <attributes paragraph="1" paragraph_position="5" text="right" />
    </node>
    <node ID="0.6" type="Word">
      <attributes paragraph="1" paragraph_position="6" text="call" />
    </node>
    <node ID="0.7" type="Punctuation">
      <attributes paragraph="1" paragraph_position="7" text="." />
    </node>
  </layer>
  <layer layerID="1">
    <attributes />
    <node ID="1.1" type="FN">
      <attributes />
      <edge toID="1.2" type="H">
        <attributes />
        <category tag="H" />
      </edge>
    </node>
    <node ID="1.2" type="FN">
      <attributes />
      <edge toID="1.3" type="A">
        <attributes />
        <category tag="A" />
      </edge>
      <edge toID="1.4" type="F">
        <attributes />
        <category tag="F" />
      </edge>
      <edge toID="1.5" type="P">
        <attributes />
        <category tag="P" />
      </edge>
      <edge toID="1.6" type="A">
        <attributes />
        <category tag="A" />
      </edge>
    </node>
    <node ID="1.3" type="FN">
      <attributes />
      <edge toID="0.1" type="Terminal">
        <attributes />
        <category tag="Terminal" />
      </edge>
    </node>
    <node ID="1.4" type="FN">
      <attributes />
      <edge toID="0.2" type="Terminal">
        <attributes />
        <category tag="Terminal" />
      </edge>
    </node>
    <node ID="1.5" type="FN">
      <attributes />
      <edge toID="0.3" type="Terminal">
        <attributes />
        <category tag="Terminal" />
      </edge>
    </node>
    <node ID="1.6" type="FN">
      <attributes />
      <edge toID="1.7" type="E">
        <attributes />
        <category tag="E" />
      </edge>
      <edge toID="1.8" type="E">
        <attributes />
        <category tag="E" />
      </edge>
      <edge toID="1.9" type="C">
        <attributes />
        <category tag="C" />
      </edge>
      <edge toID="1.10" type="U">
        <attributes />
        <category tag="U" />
      </edge>
    </node>
    <node ID="1.7" type="FN">
      <attributes />
      <edge toID="0.4" type="Terminal">
        <attributes />
        <category tag="Terminal" />
      </edge>
    </node>
    <node ID="1.8" type="FN">
      <attributes />
      <edge toID="0.5" type="Terminal">
        <attributes />
        <category tag="Terminal" />
      </edge>
    </node>
    <node ID="1.9" type="FN">
      <attributes />
      <edge toID="0.6" type="Terminal">
        <attributes />
        <category tag="Terminal" />
      </edge>
    </node>
    <node ID="1.10" type="PNCT">
      <attributes />
      <edge toID="0.7" type="Terminal">
        <attributes />
        <category tag="Terminal" />
      </edge>
    </node>
  </layer>
</root>

Tree produces by the TUPA installed by pip:

<root annotationID="0" passageID="1_1_4_0">
  <attributes />
  <extra format="ucca" />
  <layer layerID="0">
    <attributes />
    <extra doc="[[[8548371584360429291, 8548371584360429291, 15308085513773655218, 91, 0, 2, 426, 2, 13110060611322374290, 8861071527689086543, 14431812100313486463], [3411606890003347522, 10382539506755952630, 13927759927860985106, 99, 0, 2, 402, 1, 4370460163704169311, 5097672513440128799, 3411606890003347522], [200141916689428108, 9614445426764226664, 1534113631682161808, 99, 0, 2, 8206900633647566924, 0, 13110060611322374290, 646772771845179972, 7679303661980345986], [7425985699627899538, 7425985699627899538, 15267657372422890137, 89, 0, 2, 412, 2, 4088098365541558500, 15369245168918225700, 7425985699627899538], [5943797630011647483, 5943797630011647483, 10554686591937588953, 83, 0, 2, 399, 1, 13110060611322374290, 16562859848569467201, 9602547604334134776], [14229572451745258962, 14229572451745258962, 15308085513773655218, 91, 0, 2, 413, -3, 13110060611322374290, 16900879642891266615, 13409319323822384369], [12646065887601541794, 12646065887601541794, 12646065887601541794, 96, 0, 2, 442, -4, 12646065887601541794, 12646065887601541794, 12646065887601541794]]]" />
    <node ID="0.1" type="Word">
      <attributes paragraph="1" paragraph_position="1" text="pence" />
    </node>
    <node ID="0.2" type="Word">
      <attributes paragraph="1" paragraph_position="2" text="is" />
    </node>
    <node ID="0.3" type="Word">
      <attributes paragraph="1" paragraph_position="3" text="making" />
    </node>
    <node ID="0.4" type="Word">
      <attributes paragraph="1" paragraph_position="4" text="the" />
    </node>
    <node ID="0.5" type="Word">
      <attributes paragraph="1" paragraph_position="5" text="right" />
    </node>
    <node ID="0.6" type="Word">
      <attributes paragraph="1" paragraph_position="6" text="call" />
    </node>
    <node ID="0.7" type="Punctuation">
      <attributes paragraph="1" paragraph_position="7" text="." />
    </node>
  </layer>
  <layer layerID="1">
    <attributes />
    <node ID="1.1" type="FN">
      <attributes />
      <edge toID="1.2" type="C">
        <attributes />
        <category tag="C" />
      </edge>
      <edge toID="1.3" type="F">
        <attributes />
        <category tag="F" />
      </edge>
      <edge toID="1.4" type="C">
        <attributes />
        <category tag="C" />
      </edge>
    </node>
    <node ID="1.2" type="FN">
      <attributes />
      <edge toID="0.1" type="Terminal">
        <attributes />
        <category tag="Terminal" />
      </edge>
    </node>
    <node ID="1.3" type="FN">
      <attributes />
      <edge toID="0.2" type="Terminal">
        <attributes />
        <category tag="Terminal" />
      </edge>
    </node>
    <node ID="1.4" type="FN">
      <attributes />
      <edge toID="1.5" type="P">
        <attributes />
        <category tag="P" />
      </edge>
      <edge toID="1.6" type="A">
        <attributes />
        <category tag="A" />
      </edge>
    </node>
    <node ID="1.5" type="FN">
      <attributes />
      <edge toID="0.3" type="Terminal">
        <attributes />
        <category tag="Terminal" />
      </edge>
    </node>
    <node ID="1.6" type="FN">
      <attributes />
      <edge toID="1.7" type="E">
        <attributes />
        <category tag="E" />
      </edge>
      <edge toID="1.8" type="E">
        <attributes />
        <category tag="E" />
      </edge>
      <edge toID="1.9" type="C">
        <attributes />
        <category tag="C" />
      </edge>
      <edge toID="1.10" type="U">
        <attributes />
        <category tag="U" />
      </edge>
    </node>
    <node ID="1.7" type="FN">
      <attributes />
      <edge toID="0.4" type="Terminal">
        <attributes />
        <category tag="Terminal" />
      </edge>
    </node>
    <node ID="1.8" type="FN">
      <attributes />
      <edge toID="0.5" type="Terminal">
        <attributes />
        <category tag="Terminal" />
      </edge>
    </node>
    <node ID="1.9" type="FN">
      <attributes />
      <edge toID="0.6" type="Terminal">
        <attributes />
        <category tag="Terminal" />
      </edge>
    </node>
    <node ID="1.10" type="PNCT">
      <attributes />
      <edge toID="0.7" type="Terminal">
        <attributes />
        <category tag="Terminal" />
      </edge>
    </node>
  </layer>
</root>
danielhers commented 5 years ago

Neither is guaranteed to be 100% accurate. The demo was using a different trained model (with the same settings but a different random seed). Now it's updated to use the same pre-trained model as in the release.