Open danielhers opened 5 years ago
that does sound like a somewhat more contentful normalization than what we are doing so far. are F-unctions generally ignored in scoring, or just in interaction with C-enters? if the former, one could argue that the MRP serializations should maybe just remove the nodes and edges involved—as we did for implicit nodes in UCCA, tense, aspect, number, etc. in EDS, and :wiki links in AMR? see also #36.
current normalization, i believe, is local to one node or edge, i.e. downcasing and trimming initial or final punctuation marks in anchors. in comparison, rewriting or discarding some of the structure would feel like straying further from the ideal that everything in the MRP graphs is evaluated.
Functions are never ignored in scoring. Their attachment location is, regardless of interaction with Centers. What this means in detail is that if the same span is a Function in both the gold and evaluated graphs, then the unit with that span is moved to the root in both. We tried to come up with a normalization strategy that would only require one graph at a time but weren't sure how to do it right.
The Graph.normalize() function already normalizes some phenomena specific to AMR and EDS: https://github.com/cfmrp/mtool/blob/94cc1a46367203bb7f55f0eb4cf20dc7fa208adb/graph.py#L57-L83 But some issues remain. In particular, UCCA, in its native scorer, normalizes (https://github.com/cfmrp/mtool/blob/master/ucca/normalization.py) Centers and Functions by flattening nested Centers and ignoring the attachment of Functions. This should be applied in MRP too.