AkiraTOSEI / ML_papers

ML_paper_summary(in Japanese)
5 stars 1 forks source link

Learning to summarize from human feedback #109

Open AkiraTOSEI opened 3 years ago

AkiraTOSEI commented 3 years ago

TL;DR

Document summaries use ROUGE scores to assess the quality of the summaries, but they do not correctly assess the quality of the summaries. In this study, they generate human-preferred summaries by reinforcement learning using human evaluations as the Reward. They outperformed the existing models in human evaluation. Furthermore, it is robust to changes in data domains (Reddit training → CNN/DM evaluation) 1 Collect human feedback

Paper URL

https://arxiv.org/abs/2009.01325

Submission Dates(yyyy/mm/dd)

2020/09/02

Authors and institutions

Nisan Stiennon, Long Ouyang, Jeff Wu, Daniel M. Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, Paul Christiano

Methods

Results

Comments