Understanding adequacy metric

cegersdoerfer commented 2 years ago

Hi, I have been using the filters file from this repo to experiment on evaluating some paraphrases I created using various different models, but I noticed that the adequacy score gives some unexpected results so I was wondering if you could tell me some more about how it was trained? I noticed that if the paraphrase and the original are the exact same, the adequacy is quite low (around 0.7-0.80). If the paraphrase is shorter or longer than the original, it generally has a much higher score. Ex. Original: "I need to buy a house in the neighborhood" -> Paraphrase: "I need to buy a house" the paraphrase has a score of 0.98. Paraphrase: "I need to buy a house in the neighborhood where I want to live" results in an even higher score of .99 while the paraphrase "I need to buy a house in the neighborhood" (which is the same exact sentence as the original) gets a score of 0.7 and the same sentence with a period at the end gets 0.8. This makes me think that the adequacy model takes into account how much the new sentence has changed from the original in addition to how well its meaning was preserved in some way. Since the ReadMe states that adequacy measures whether or not the paraphrase preserves the meaning of the original, it is confusing to me that using the same sentence for original and paraphrase does not get a high score, could you clarify?

PrithivirajDamodaran commented 2 years ago

The model is framed as a sentence pair regression task. By design, it preserves the intent to the core, and even if the intent changes slightly it model is sensitive enough to catch it. Look at example 1
Don't conflate this with other sentence pair regression tasks like STS or STS2 o STS5. The strange behavior was due to a bug in considering the raw logits out of the model. Fixed it. (With or without a period shouldn't see a different score, negligible if at all some difference)
Now it should work. Look at example 2 (yours)

cegersdoerfer commented 2 years ago

Thank you @PrithivirajDamodaran for clearing that up and fixing the bug!

PrithivirajDamodaran / Parrot_Paraphraser

Understanding adequacy metric #18