goodmami / penman

PENMAN notation (e.g. AMR) in Python
https://penman.readthedocs.io/
MIT License
139 stars 27 forks source link

a bug when decoding an AMR graph #87

Closed lukecyu closed 3 years ago

lukecyu commented 4 years ago

This data is from amr_3.0

# ::id bolt-eng-DF-170-181105-8850361_0170.26 ::date 2017-07-08T14:51:22 ::annotator SDL-AMR-09 ::preferred
# ::snt it is NOT designed to bring the insurance companies down, it IS DESIGNED in the manner to give them ALL A LEVEL PLAYING FIELD so that true competition exists.
# ::save-date Sun Jul 16, 2017 ::file bolt-eng-DF-170-181105-8850361_0170_26.txt
(c2 / contrast-01
      :ARG1 (d / design-01 :polarity -
            :ARG1 (i2 / it)
            :ARG3 (b / bring-down-03
                  :ARG0 i2
                  :ARG1 (c / company
                        :ARG0-of (i / insure-02)
                        :ARG0-of i)))
      :ARG2 (d2 / design-01
            :ARG1 i2
            :ARG3 (g / give-01
                  :ARG0 i2
                  :ARG1 (f / field
                        :location-of (p / play-01)
                        :ARG1-of (l / level-04))
                  :ARG2 (c3 / company
                        :mod (a / all))
                  :purpose (e / exist-01
                        :ARG1 (c4 / compete-02
                              :ARG1-of (t / true-01))))))

When I using penman to deal with this file (named f below)

g = (penman.load(f))[0]
e = penman.encode(g)   # encode again to a string, find it is different from the origin at the edge (c, i)
g2 = penman.decode(e)
for edge in g2.edges():
     appears_inverted(g2, edge) 

result:

False
False
False
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/home/cyu28/anaconda3/envs/pytorch3/lib/python3.7/site-packages/penman/layout.py", line 584, in appears_inverted
    for variable, _triple in zip(node_contexts(g), g.triples):
  File "/home/cyu28/anaconda3/envs/pytorch3/lib/python3.7/site-packages/penman/layout.py", line 624, in node_contexts
    if stack[-1] not in eligible:
IndexError: list index out of range

Find that in g2, POP occurs more than PUSH, but no problem with g (origin file)

(edited for formatting)

goodmami commented 4 years ago

Thanks for the report. The issue is duplicate edges. These have given me problems in the past (see #34, #35).

To create a more minimal example:

(c / company
   :ARG0-of (i / insure-02)
   :ARG0-of i))

The (i, ARG0, c) triple appears twice here. When you serialize it again, the layout engine writes it this way:

(c / company
   :ARG0-of (i / insure-02)
               :ARG0 c))

This is because epigraph markers like Push and Pop are indexed by their triple, so the two duplicate triples are conflated here. Parsing this new graph then gives you the odd situation where the triple (i, ARG0, c) appears both inverted and in regular orientation. This breaks an assumption in the code and results in the error you saw.

I believe these duplicated edges are bad, so probably the best option is to prune them out when reading the tree from the string, even though I'm not fond of altering the graph at all.