TL;DR

Document summaries use ROUGE scores to assess the quality of the summaries, but they do not correctly assess the quality of the summaries. In this study, they generate human-preferred summaries by reinforcement learning using human evaluations as the Reward. They outperformed the existing models in human evaluation. Furthermore, it is robust to changes in data domains (Reddit training → CNN/DM evaluation) 1 Collect human feedback

Paper URL

https://arxiv.org/abs/2009.01325

Submission Dates(yyyy/mm/dd)

2020/09/02

Authors and institutions

Nisan Stiennon, Long Ouyang, Jeff Wu, Daniel M. Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, Paul Christiano

OpenAI

AkiraTOSEI / ML_papers

Learning to summarize from human feedback #109

TL;DR

Paper URL

Submission Dates(yyyy/mm/dd)

Authors and institutions

Methods

Results

Comments