cfmrp / mtool

Software to Manipulate Different Flavors of Semantic Graphs
http://mrp.nlpl.eu
GNU Lesser General Public License v3.0
51 stars 24 forks source link

normalizing UCCA graphs for evaluation #65

Open danielhers opened 5 years ago

danielhers commented 5 years ago

The Graph.normalize() function already normalizes some phenomena specific to AMR and EDS: https://github.com/cfmrp/mtool/blob/94cc1a46367203bb7f55f0eb4cf20dc7fa208adb/graph.py#L57-L83 But some issues remain. In particular, UCCA, in its native scorer, normalizes (https://github.com/cfmrp/mtool/blob/master/ucca/normalization.py) Centers and Functions by flattening nested Centers and ignoring the attachment of Functions. This should be applied in MRP too.

oepen commented 5 years ago

that does sound like a somewhat more contentful normalization than what we are doing so far. are F-unctions generally ignored in scoring, or just in interaction with C-enters? if the former, one could argue that the MRP serializations should maybe just remove the nodes and edges involved—as we did for implicit nodes in UCCA, tense, aspect, number, etc. in EDS, and :wiki links in AMR? see also #36.

current normalization, i believe, is local to one node or edge, i.e. downcasing and trimming initial or final punctuation marks in anchors. in comparison, rewriting or discarding some of the structure would feel like straying further from the ideal that everything in the MRP graphs is evaluated.

danielhers commented 5 years ago

Functions are never ignored in scoring. Their attachment location is, regardless of interaction with Centers. What this means in detail is that if the same span is a Function in both the gold and evaluated graphs, then the unit with that span is moved to the root in both. We tried to come up with a normalization strategy that would only require one graph at a time but weren't sure how to do it right.