cfmrp / mtool

Software to Manipulate Different Flavors of Semantic Graphs
http://mrp.nlpl.eu
GNU Lesser General Public License v3.0
51 stars 24 forks source link

Smatch of a UCCA graph against itself gives <1 #56

Open danielhers opened 5 years ago

danielhers commented 5 years ago

The smatch scorer could probably take advantage of the identity of nodes better. Running:

./main.py --trace --trace --read mrp --score smatch --gold data/score/ucca/id.mrp data/score/ucca/id.mrp

Gives me an f-score of 0.7435897435897437, although I'm passing the same file twice.

oepen commented 5 years ago

i fear this ultimately is an inevitable possibility in the SMATCH philosophy, just as much as of the approach we have adopted for the MRP metric?

the search space for node-to-node correspondences can be too vast to explore fully, and maybe especially with the UCCA graphs (where the majority of nodes are unlabeled and unanchored, leaving the correspondence wholly to structural considerations) that leaves the solution at the mercy of the ‘smart’ initialization and effectiveness of the approximative search.

if there were an expectation for node identifiers across graphs to pattern in certain ways, i imagine one could extend initialization to minimize the (unsigned) difference between node identifiers, possibly as a secondary criterion? but i am unsure there is such an expectation for identifier distributions across parsers?

if nothing else, i would like to think one can make this implausible scoring result (on an artificial test case) go away by allowing the random-restart hill-climbing a longer leash? and anyway, i take it the MRP scorer manages to steer clear of the problem in this particular instance?

danielhers commented 5 years ago

This relates to previous discussions we've had, and of course it will always be a possibility to get non-accurate results due to the scorer's random nature. However, I feel we should be able to encourage it to find the trivial identity mapping more easily, rather than letting it run for longer to search for it. We did some work in this direction for MCES but it looks like Smatch still suffers.

Yes, the MRP scorer works fine here but still takes a long time, which is a different issue.