Closed ZhangShiyue closed 3 years ago
I used the averaged "relevance" judgment by the expert annotators. This file has an example of how I calculated the correlations for QAEval. This is the key line which picks the ground-truth metric that can be changed depending on your use case: https://github.com/danieldeutsch/sacrerouge/blob/b4d86a0254c427ad661a9c3611d954482bcf003e/experiments/qaeval/run-fabbri2020.sh#L15
Got it. Thanks!
Hi Daniel,
Looks like you used one single responsiveness score to compute the correlations for Fabbri2020 data: https://github.com/danieldeutsch/sacrerouge/blob/master/doc/datasets/fabbri2020.md?
I just wanted to ask which responsiveness score did you use? Because it has four different ratings (coherence, etc.) and each has multiple raters.
Thanks, Shiyue