ablodge / leamr

A structurally comprehensive dataset of AMR-to-text alignments for coverage of a larger variety of linguistic phenomena, for research related to AMR parsing, generation, and evaluation.
15 stars 5 forks source link

Relation Coverage Metric #8

Open Carlosml26 opened 2 years ago

Carlosml26 commented 2 years ago

Hi, when evaluating each alignment (subgraph, relation, ...) the resulting coverage for relations, both for the predictions and the gold, is around 75% by using the evaluation code provided:

pred coverage: 74.16%
gold coverage: 74.03%
Span F1:        91.98   (#gold 1168)
Score   Precision       Recall  F1
Partial Align:  87.77   87.99   87.88   (#gold 1168)
Exact Align:    84.63   84.85   84.74   (#gold 1168)

However in the paper it is reported as 100%. Is it because those missing relations are assumed to be already predicted within subgraph structures and therefore, the combined predictions from subgraphs and relations is 100% or is there something I may be doing wrong when evaluating for relations?

Thanks.

ablodge commented 2 years ago

Hi @Carlosml26, thank you for pointing this out. The paper is correct about the 100% coverage for relations, though the evaluation scripts don't make that obvious. (This code is from my dissertation research, and I've been meaning to make it cleaner when I have the chance). I believe the issue your seeing here is because of the fact that AMR edges get aligned in several layers, and so ~75% are aligned in the relation layer, some edges are aligned in the subgraph layer for subgraphs like (c/city :name (n/name :op1 "New" :op2 "York")) that contain one or more edge, and some are aligned in the duplicate subgraph layer. I'll double check this and update the evaluation script to make that clear.