Fabbri2020 Responsiveness score

danieldeutsch / sacrerouge

SacreROUGE is a library dedicated to the use and development of text generation evaluation metrics with an emphasis on summarization.

Apache License 2.0

134 stars 11 forks source link

Closed ZhangShiyue closed 3 years ago

ZhangShiyue commented 3 years ago

Hi Daniel,

Looks like you used one single responsiveness score to compute the correlations for Fabbri2020 data: https://github.com/danieldeutsch/sacrerouge/blob/master/doc/datasets/fabbri2020.md?

I just wanted to ask which responsiveness score did you use? Because it has four different ratings (coherence, etc.) and each has multiple raters.

Thanks, Shiyue

danieldeutsch commented 3 years ago

I used the averaged "relevance" judgment by the expert annotators. This file has an example of how I calculated the correlations for QAEval. This is the key line which picks the ground-truth metric that can be changed depending on your use case: https://github.com/danieldeutsch/sacrerouge/blob/b4d86a0254c427ad661a9c3611d954482bcf003e/experiments/qaeval/run-fabbri2020.sh#L15

ZhangShiyue commented 3 years ago

Got it. Thanks!