PrithivirajDamodaran / Parrot_Paraphraser

A practical and feature-rich paraphrasing framework to augment human intents in text form to build robust NLU models for conversational engines. Created by Prithiviraj Damodaran. Open to pull requests and other forms of collaboration.
Apache License 2.0
866 stars 141 forks source link

Understanding adequacy metric #18

Closed cegersdoerfer closed 2 years ago

cegersdoerfer commented 2 years ago

Hi, I have been using the filters file from this repo to experiment on evaluating some paraphrases I created using various different models, but I noticed that the adequacy score gives some unexpected results so I was wondering if you could tell me some more about how it was trained? I noticed that if the paraphrase and the original are the exact same, the adequacy is quite low (around 0.7-0.80). If the paraphrase is shorter or longer than the original, it generally has a much higher score. Ex. Original: "I need to buy a house in the neighborhood" -> Paraphrase: "I need to buy a house" the paraphrase has a score of 0.98. Paraphrase: "I need to buy a house in the neighborhood where I want to live" results in an even higher score of .99 while the paraphrase "I need to buy a house in the neighborhood" (which is the same exact sentence as the original) gets a score of 0.7 and the same sentence with a period at the end gets 0.8. This makes me think that the adequacy model takes into account how much the new sentence has changed from the original in addition to how well its meaning was preserved in some way. Since the ReadMe states that adequacy measures whether or not the paraphrase preserves the meaning of the original, it is confusing to me that using the same sentence for original and paraphrase does not get a high score, could you clarify?

PrithivirajDamodaran commented 2 years ago
Screenshot 2022-06-25 at 9 25 12 AM
cegersdoerfer commented 2 years ago

Thank you @PrithivirajDamodaran for clearing that up and fixing the bug!