discord between SMATCH and MRP counts

oepen commented 5 years ago

the counts of matching (i.e. shared) triples between a gold and system graph sometimes diverge between the SMATCH and the MRP metrics, even when forcing the latter to use the exact same node-to-node correspondence as returned from SMATCH; for example:

./main.py --read amr --trace --trace --score mrp --gold data/score/amr/233.gold.amr data/score/amr/233.system.amr

for this small pair of graphs, SMATCH reports ten matching nodes, compared to a sum of nine from the MRP implementation. the SMATCH counts are the same from our wrapper or when using the official version directly from the command line:

python3 snowblink/smatch.py -vv -f data/score/amr/233.gold.amr data/score/amr/233.system.amr > /tmp/smatch.log 2>&1

i extracted the sets of triples for both graphs from smatch.log and renamed nodes for the system graph according to the node correspondences reported in the log file ([0, 3, 4, -1, 2, 1]); the resulting triple sets (one per line, lexicographically sorted) are in the attached files g.txt and s.txt. assuming the manual renaming of nodes did not introduce any errors, the comm command appears to side with the MRP counts:

comm -12 /tmp/g.txt /tmp/s.txt | wc -l
9

233.gold.pdf 233.system.pdf g.txt s.txt

timjogorman commented 5 years ago

As far as I can tell, this seems to be a bug in the MRP metric code -- normalization of inverse edges wasn't getting lowercased like everything else. The metrics end up differing because the SMATCH code is robust to it, but the score function used in MRP is not. The hand-coded files attached above had a typo which probably led the analysis astray -- ('instance', 'join-up-02') should be ('instance', 'a0', 'join-up-02').

After adding a line to the Edge object making sure to lowercase self.normal when we lowercase self.lab, I'm getting the same number of matching nodes between MRP and SMATCH.

oepen commented 5 years ago

thanks for wrapping up the night shift, tim! further towards perfection: that seems like an important fix. no more mismatches on the first 250 or so CoLi graphs ...

cfmrp / mtool

discord between SMATCH and MRP counts #51