goodmami / penman

PENMAN notation (e.g. AMR) in Python
https://penman.readthedocs.io/
MIT License
139 stars 27 forks source link

Re-referenced Node Formatting #119

Open Zoher15 opened 1 year ago

Zoher15 commented 1 year ago

Hi @goodmami ,

I see this weird formatting quirk where a re-referenced node is written to file within parantheses sometimes: image

This is creating issues with tools that use their own parsers like smatch and sembleu. Is there a way to avoid this?

Best, Zoher

Zoher15 commented 1 year ago

This happens when 'graph.triples' are shuffled in place.....

goodmami commented 1 year ago

@Zoher15 thanks for the report. This is clearly a bug.

I think this is because the epigraphical data from the original graph doesn't match the shuffled triples, and a Push marker is trying to create a new node context on the re-entrant variable (if you're not familiar with how Penman uses an epigraph to encode the serialized layout of a graph, the penman.layout module documentation gives a summary at the top).

We can create a MWE like this:

>>> import penman
>>> from penman.layout import Push
>>> g = penman.Graph(
...     [("a", ":instance", "A"), ("a", ":ARG0", "b")],
...     epidata={("a", ":ARG0", "b"): [Push("b")]}
... )
>>> print(penman.encode(g))
(a / A
   :ARG0 (b))

Penman is supposed to respect the epigraph data when they match the triples in the graph and to ignore them otherwise. In this case we should check in penman.layout.configure() that upon encountering a triple with the epigraphical marker Push(x), the next triple should be (x, ":instance", ...), and if not to ignore the Push.

Zoher15 commented 1 year ago

Thanks @goodmami! I obviously got around it by shuffling a copy of the triples instead.

PS: This is an amazing package! Thanks for your work!

goodmami commented 1 year ago

@Zoher15 I don't know how robust a solution shuffling a copy of the triples is. It might just be that a different shuffling didn't cause the issue to surface.

This is clearly a bug.

I spoke too soon. This is in fact a supported graph type in Penman. However I agree that it is generally unacceptable for AMR. So rather than changing the layout code, we might need something specific for the AMR model.

flipz357 commented 11 months ago

@Zoher15 What you posted is not a bug in penman and I also see no reason why it shouldn't be a valid AMR (it's just redundancy). So this is an issue that you better submit to the smatch/sembleu repos. Note that the smatch/sembleu AMR reading can also be buggy in other ways, so what you should actually do is use an improved amr/graph reader like penman, or use smatch++ which also fixes some other issues in smatch.