Recall score for the kept tokens in SARI

ekQ commented 6 years ago

Thank you for making this code available! I was trying to understand how the different components of the SARI score are computed, and I wonder if I've misunderstood something or if there's an inconsistency between the code and the paper. Consider the following example.

Input: "a b" Output: "b" Ref-1: "a b" Ref-2: "a"

Now if I manually compute the recall of kept tokens using Eq. 5 from the paper, I get

    r_{keep}(1) = [min(0, 1) + min(1, 1/2)] / [1 + 1/2] = 1/3,

where the first terms of the numerator and denominator correspond to "a" and the second terms to "b". However, the GitHub implementation gives me

    r_{keep}(1) = 1/2.

The reason is that in the code the terms of the numerator are divided individually by the corresponding denominator terms on line 58, instead of dividing the sum of the numerator terms by the sum of the denominator terms as done in Eq. 5 in the paper.

Replacing line 58 by:

    keeptmpscore2 += keepgramcountergood_rep[keepgram]

and line 65 by:

    keepscore_recall = keeptmpscore2 / sum(keepgramcounterall_rep.values())

seems to fix this and yield p_{keep}(1) = 1/3 as I would expect it to yield.

Have I missed something? Thanks in advance!

cocoxu commented 6 years ago

We inflated the counts by number of references in order to achieve the weighting of R' more conveniently. We weighted each ngram in calculating the numerator keeptmpscore1 and keeptmpscore2, then divided by len(keepgramcounterall_rep) instead of weighted sum (which will weigh each ngram differently). In this way, as the current code, we treat each ngram equally in the weighting for calculating the denominator -- we found this more robust and worked well in practice.

ekQ commented 6 years ago

Thanks for the quick reply! I understand the idea of inflating the counts, but it's not clear to me why, on line 58

keeptmpscore2 += keepgramcountergood_rep[keepgram] / keepgramcounterall_rep[keepgram]

the added terms are divided by keepgramcounterall_rep[keepgram]. Since the adjusted R' scores are incorporated both into keepgramcountergood_rep and into keepgramcounterall_rep, it seems that the division cancels out their effect, and in the end, only 1s get added to the numerator keeptmpscore2. In other words, it seems that r_{keep}(n) is effectively computed using R instead of R'.

cocoxu / simplification

Recall score for the kept tokens in SARI #6