Abstractive summarization models are commonly trained using maximumlikelihood estimation, which assumes a deterministic (one-point) targetdistribution in which an ideal model will assign all the probability mass tothe reference summary. This assumption may lead to performance degradationduring inference, where the model needs to compare several system-generated(candidate) summaries that have deviated from the reference summary. To addressthis problem, we propose a novel training paradigm which assumes anon-deterministic distribution so that different candidate summaries areassigned probability mass according to their quality. Our method achieves a newstate-of-the-art result on the CNN/DailyMail (47.78 ROUGE-1) and XSum (49.07ROUGE-1) datasets. Further analysis also shows that our model can estimateprobabilities of candidate summaries that are more correlated with their levelof quality.
URL
Affiliations
Abstract
Translation (by gpt-3.5-turbo)
Summary (by gpt-3.5-turbo)