google-research / bleurt

BLEURT is a metric for Natural Language Generation based on transfer learning.
https://arxiv.org/abs/2004.04696
Apache License 2.0
697 stars 85 forks source link

How to calculate overall bleurt score? #28

Closed shankyemcee closed 3 years ago

shankyemcee commented 3 years ago

HI,

When i run the code on my own dataset I get the scores file back with a score calculated for each sample. Am i supposed to take average on all the scores generated?

My dataset consists of the reference file with a single line per sample, and a generated summary file also with a single line per sample. I am doing data to text generation, and the reference file is all the expected output summaries, and the generated summary file is all the actual output summaries.