Interpreting Bleurt scores

google-research / bleurt

BLEURT is a metric for Natural Language Generation based on transfer learning.

https://arxiv.org/abs/2004.04696

Apache License 2.0

685 stars 85 forks source link

Interpreting Bleurt scores #41

Closed marcoavagnano98 closed 2 years ago

marcoavagnano98 commented 2 years ago

Hi, sorry for the stupid question, maybe you have already answered to this. You wrote: "In practice however, the answers tend to be very correlated with fluency ("Is the text fluent English?"), and we added synthetic noise in the training set which makes the distinction between adequacy and fluency somewhat fuzzy", I'm a little bit confused by this. Which are ultimately the aspects evaluated in the text by the metric in translation task? Thanks and sorry for bad english ;)

tsellam commented 2 years ago

Hi, thanks for your feedback, this is actually a very good question. BLEURT depends on both fluency on adequacy, and unfortunately it's still unclear which one has the most influence. Two great papers on the topic: https://aclanthology.org/2021.emnlp-main.575.pdf https://aclanthology.org/2021.emnlp-main.701.pdf