bjascob / amrlib

A python library that makes AMR parsing, generation and visualization simple.
MIT License
216 stars 33 forks source link

Unsupported multiple concepts or values in different nodes because of set operation #70

Closed AbdiHaryadi closed 1 day ago

AbdiHaryadi commented 3 weeks ago

I found this part of implementation in amrlib/evaluate/smatch_enhanced.py:

def compute_subscores(pred, gold):
    ...
    # Loop through all entries
    for amr_pred, amr_gold in zip(pred, gold):
        ...
        # Wikification scores
        list_pred = wikification(triples_pred)
        list_gold = wikification(triples_gold)
        inters["Wikification"] += len(list(set(list_pred) & set(list_gold)))
        preds["Wikification"] += len(set(list_pred))
        golds["Wikification"] += len(set(list_gold))
        ...

You can see that the calculation of inters["Wikification"] is finding the length of intersection of pred set and gold set. However, these set conversions can't support AMR with multiple nodes which refer to same wiki, since it will be counted as one.

Consider this example. Here is the example file of gold.txt:

( m / multi-sentence
    :snt1 ( p / person
        :wiki "Barack_Obama"
        :name ( n / name
            :op1 "Obama" ) )
    :snt2 ( p2 / person
        :wiki "Barack_Obama"
        :name ( n2 / nama
            :op1 "Barack"
            :op2 "Obama" ) ) )

Example sentence: Obama. Barack Obama.

Here is the example file of pred.txt:

( p / person
    :wiki "Barack_Obama"
    :name ( n / name
            :op1 "Obama" ) )

You can see that there are two wikis in gold.txt, but only one on pred.txt. All wikis refer to the same person. I calculated with those files with compute_scores function. The result is 1.000 for Wikification recall. I'm not sure whether this is expected or not, but I think the result should be 0.500 since it just predict one from two wikis.

If this result is unexpected, I think those set operations should be changed to multi-set, or frequency table, so it supports the case that I've written.

Edit: I only showed wikification case for this issue as an example. This issue is related to all compute_subscores metrics, from Non_sense_frames to Frames.

nschneid commented 3 weeks ago

I take it the problem is specific to multi-sentence, where there are different nodes for the same entity in different sentences. Within a sentence, a wikified entity should only have one variable.