goodmami / penman

PENMAN notation (e.g. AMR) in Python
https://penman.readthedocs.io/
MIT License
141 stars 26 forks source link

Surface alignments on reentrant variables can cause errors #93

Closed goodmami closed 3 years ago

goodmami commented 4 years ago

Observe:

>>> import penman
>>> g = penman.decode('(a / alpha :ARG0 b~1)')  # alignment on attribute value
>>> print(penman.encode(g))  # no problem
(a / alpha
   :ARG0 b~1)
>>> g.triples.append(('b', ':instance', 'beta'))  # the attribute value is now a variable
>>> print(penman.encode(g))  # oops
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/goodmami/git/penman/penman/codec.py", line 237, in _encode
    return codec.encode(g,
  File "/home/goodmami/git/penman/penman/codec.py", line 130, in encode
    tree = layout.configure(g, top=top, model=self.model)
  File "/home/goodmami/git/penman/penman/layout.py", line 271, in configure
    raise LayoutError('possibly disconnected graph')
penman.exceptions.LayoutError: possibly disconnected graph
>>> g.epidata = {}
>>> print(penman.encode(g))  # without the surface alignment, all is good
(a / alpha
   :ARG0 (b / beta))

This is because we cannot have a surface alignment after a full node (:ARG0 (b / beta)~1 is apparently illegal syntax in AMR, but I cannot find the reference for that point). Better would be to drop the alignment with a warning.

This situation may turn up if someone is programmatically constructing graphs with alignments, or possibly when rearranging a graph that (validly) has an alignment on a reentrant variable.

bjascob commented 4 years ago

Good to know that this issue can crop up. Generally, I'm not sure you'd want to modify a graph with existing surface alignments. At least you'd have to be very careful doing that.

The one use case I can think of is adding the :wiki tags, to graphs. This is something that might be done as post-processing, so if adding a :wiki will cause errors, that might cause some headaches. Still, I'm guessing that for most people it wouldn't be an issue.

Just FYI that I have processed a large portion of the aligned LDC2020T02 (AMR 3.0) graphs through penman.decode(x) and haven't seen any errors when re-encoding them. Not sure if that confirms that, at least for this corpus, "we do not have a surface alignment after a full node"