Open alexanderkoller opened 5 years ago
A first version of the conversion is implemented. However, it requires the input to already contain a inverted edges (e.g. ARG1-of
instead of ARG1
) where necessary. For the sample data wsj amr as well as the MRP training data this seems to be the case, so the currently implemented simple depth first search will suffice.
Why I haven't requested a pull yet:
:op1 "Pierre"
but for :polarity -
no "
required - are there other cases like that that I missed ? )Unfortunately, the conversion is not lossless :( Sometimes, things like "quant" are used as properties and sometimes they are used as edges, so turning them into edges and making them properties again doesn't work, because there's ambiguity. See for instance the graph PROXY_AFP_ENG_20020416_0331.17
So, I'd suggest to add a suffix, e.g. _prop
to those things that are meant to be interpreted as properties... We then have to adapt the code for the decomposition though to take the new edge labels into account.
Good catch. Before we invest into this, can we quantify how much this hurts us?
Not in isolation. With all pre- and postprocessing, when I compare a random subset of the decomposable graphs against their gold graphs, I get the following MRP scores:
{"n": 1491,
"exact": 1487,
"tops": {"g": 1491, "s": 1491, "c": 1490, "p": 0.9993293091884641, "r": 0.9993293091884641, "f": 0.9993293091884641},
"labels": {"g": 14765, "s": 14431, "c": 14234, "p": 0.9863488323747488, "r": 0.9640365729766339, "f": 0.9750650774078641},
"properties": {"g": 2412, "s": 2723, "c": 2237, "p": 0.821520381931693, "r": 0.9274461028192371, "f": 0.8712755598831547},
"anchors": {"g": 0, "s": 0, "c": 0, "p": 0.0, "r": 0.0, "f": 0.0},
"edges": {"g": 14718, "s": 13692, "c": 13599, "p": 0.9932077125328659, "r": 0.9239706481858948, "f": 0.9573389651531151},
"attributes": {"g": 0, "s": 0, "c": 0, "p": 0.0, "r": 0.0, "f": 0.0},
"all": {"g": 33386, "s": 32337, "c": 31560, "p": 0.9759717970127099, "r": 0.9453064158629365, "f": 0.9603943824840618},
"time": 216.625990977}
Other sources of information loss are:
But from your data it looks like we actually do quite badly on the "properties" score, which could be explained by the lossiness of the conversion. Right? So then this would indeed be worth fixing.
Yes, that's probably true. The negative effect of the relabeling on the property score is probably very minor.
I just factored this into a separate issue #25 and took the liberty of assigning Ms. Weißenhorn to it.
Given that the AMR conversion itself is done, shall we relabel this issue as post-mrp-cleanup and postpone the actual pull request until after the deadline?
Mhm, I wonder how mtool was able to convert amrs to mrps correctly without special _prop
suffixes for properties. To me it seems like mtool was able to detect properties in a normal penman graph, so I wonder whether we can use the same strategy and don't have to add special suffixes to some edges.
I think they make a distinction between nodes with names and those that don't have names. (At least, that's my perspective). Nodes without names probably become properties and nodes with names become actual nodes in the MRP graph. However, I'm not completely sure whether the annotators would agree with this interpretation of the AMR penman string because it has weird consequences (or this is an annotation error...?):
bolt-eng-DF-170-181106-8873009_0024.7
: "as 7 belong to Bush and 1 to Obama."
In any case, this distinction doesn't work for us because the Alto code doesn't know properties and works with graphs that have nodes, edges (and sources), nothing more.
Maybe the bolt-eng-DF-170-181106-8873009_0024.7
in the dfb
part of the training corpus is an annotation error. In the mrp file the ARG0 7
and ARG0 1
are said to be property-value pairs, although I would agree with you that this is somewhat weird. Anyways, I guess that when Alto reads in AMR graphs it needs to create a new node for the value
of a property. If these newly created nodes have special names (explicitanon...
), maybe you can identify such special nodes and convert them back to property value pairs for the mrp output? If that's too complicated I can modify the AMR write method for now (if you like mtool to also read such amr, I would also need to modify the read method and that's going to be more complicated I guess).
Regarding postponing the pull request for mtool until after the deadline (@alexanderkoller ):
Yes, I would like to postpone it. One of the reasons is that the amr write method currently won't work for graphs which don't contain inverted edges (x-of
) where necessary to allow a simple depth first search to succeed. Since all graphs in the mrp training data already contain such inverted edges where necessary the amr write method works for now, but I would like to eliminate this limitation before sending a pull request.
If we change the code to add suffixes to properties like @namednil suggests this can be used as an argument against accepting a pull request, because :polarity_prop
or :quant_prop
are not one of the usual AMR edges/properties/whatever.
In order to use the old AMR preprocessing pipeline with AMR, the easiest solution would be to simply convert the MRP data to AMRBank format. Let's be good citizens and simply add code to do this to the official mtool tool.
The way to do this is as follows:
write
method to codec/amr.py to write graphs in AMR format.arguments.write == "amr"
and call the new write method.