danieldeutsch / sacrerouge

SacreROUGE is a library dedicated to the use and development of text generation evaluation metrics with an emphasis on summarization.
Apache License 2.0
134 stars 11 forks source link

Should I scale the outputs for SumQE models from 0-1 by 5? #127

Closed saiprabhakar closed 2 years ago

saiprabhakar commented 2 years ago

Hi, I am trying to evaluate my Summ model on your SumQE metrics.

The SumQE paper has tables with values for Q1-5 in the 1-5 range, but the output from the models (with the example in the Readme file in the SumQE repo) is most of the time 0-1 range (once it was 1.03).

Am I missing something or is it just scaling?

Thanks

danieldeutsch commented 2 years ago

I believe what is happening is that the final output layer from the original code has no activation function to force the output to be between 0 and 1 (see here), so sometimes the value could be outside of this range. I am not an author of that work, but I would suggest clipping values > 1 to be 1 and < 0 to be 0.

saiprabhakar commented 2 years ago

Yep, that makes sense. Also, to close the loop on my question, I noticed in the training data prep code the human ratings are being divided by 5. So to get back the Likert scale outputs [0-5] we have to multiply the outputs by 5 (after fixing the range of the output to [0, 1] as you said).