Document summaries use ROUGE scores to assess the quality of the summaries, but they do not correctly assess the quality of the summaries. In this study, they generate human-preferred summaries by reinforcement learning using human evaluations as the Reward. They outperformed the existing models in human evaluation. Furthermore, it is robust to changes in data domains (Reddit training → CNN/DM evaluation)
TL;DR
Document summaries use ROUGE scores to assess the quality of the summaries, but they do not correctly assess the quality of the summaries. In this study, they generate human-preferred summaries by reinforcement learning using human evaluations as the Reward. They outperformed the existing models in human evaluation. Furthermore, it is robust to changes in data domains (Reddit training → CNN/DM evaluation)
Paper URL
https://arxiv.org/abs/2009.01325
Submission Dates(yyyy/mm/dd)
2020/09/02
Authors and institutions
Nisan Stiennon, Long Ouyang, Jeff Wu, Daniel M. Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, Paul Christiano
Methods
Results
Comments