Evaluation of a document summarization system has been a critical factor toimpact the success of the summarization task. Previous approaches, such asROUGE, mainly consider the informativeness of the assessed summary and requirehuman-generated references for each test summary. In this work, we propose toevaluate the summary qualities without reference summaries by unsupervisedcontrastive learning. Specifically, we design a new metric which covers bothlinguistic qualities and semantic informativeness based on BERT. To learn themetric, for each summary, we construct different types of negative samples withrespect to different aspects of the summary qualities, and train our model witha ranking loss. Experiments on Newsroom and CNN/Daily Mail demonstrate that ournew evaluation method outperforms other metrics even without referencesummaries. Furthermore, we show that our method is general and transferableacross datasets.
URL
Affiliations
Abstract
Translation (by gpt-3.5-turbo)
Summary (by gpt-3.5-turbo)