cocoxu / simplification

Text Simplification System and Dataset
GNU General Public License v3.0
123 stars 37 forks source link

Difference between paper equations and code #8

Open feralvam opened 5 years ago

feralvam commented 5 years ago

In Equation 7 of the paper, my understanding is that you need to compute the precision/recall of each ngram order, and then this is averaged over the maximum order of ngrams (which is 4). Only after that, you calculate the F1 score of each operation, and then compute SARI/STAR by averaging them:

add_precision = (add_precision_1 + add_precision_2 + add_precision_3 + add_precision_4) / 4
add_recall = (add_recall_1 + add_recall_2 + add_recall_3 + add_recall_4) / 4
add_f1 = 2 * add_precision * add_recall / (add_precision + add_recall)

keep_precision = (keep_precision_1 + keep_precision_2 + keep_precision_3 + keep_precision_4) / 4
keep_recall = (keep_recall_1 + keep_recall_2 + keep_recall_3 + keep_recall_4) / 4
keep_f1 = 2 * keep_precision * keep_recall / (keep_precision + keep_recall)

del_precision = (del_precision_1 + del_precision_2 + del_precision_3 + del_precision_4) / 4

sari = (add_f1 + keep_f1 + dep_precision) / 3

However, the code follows a different procedure. There, a F1 score (for each operation) is computed for each ngram order. These are accumulated (averaged by the maximum ngram order) and divided by 3 (the number of operations) in the end.

add_f1_1 = 2 * add_precision_1 * add_recall_1 / (add_precision_1 + add_recall_1)
add_f1_2 = 2 * add_precision_2 * add_recall_2 / (add_precision_2 + add_recall_2)
add_f1_3 = 2 * add_precision_3 * add_recall_3 / (add_precision_3 + add_recall_3)
add_f1_4 = 2 * add_precision_4 * add_recall_4 / (add_precision_4 + add_recall_4)

add_1 = (add_f1_1 + add_f1_2 + add_f1_3 + add_f1_4) / 4

keep_f1_1 = 2 * keep_precision_1 * keep_recall_1 / (keep_precision_1 + keep_recall_1)
keep_f1_2 = 2 * keep_precision_2 * keep_recall_2 / (keep_precision_2 + keep_recall_2)
keep_f1_3 = 2 * keep_precision_3 * keep_recall_3 / (keep_precision_3 + keep_recall_3)
keep_f1_4 = 2 * keep_precision_4 * keep_recall_4 / (keep_precision_4 + keep_recall_4)

keep_1 = (keep_f1_1 + keep_f1_2 + keep_f1_3 + keep_f1_4) / 4

del_precision = (del_precision_1 + del_precision_2 + del_precision_3 + del_precision_4) / 4

sari = (add_f1 + keep_f1 + dep_precision) / 3

These are not mathematically equivalent, so the scores produced by both ways of calculating the metric are different. Which is the correct process then? The one in the paper or the one in the code?

Thanks for your help and clarification.

cocoxu commented 5 years ago

The code is the what used for experiments in my paper and other. Many implementation details are not included in the paper, so the release of code.