AkihikoWatanabe commented 1 year ago

Recent work in the field of automatic summarization and headline generation focuses on maximizing ROUGE scores for various news datasets. We present an alternative, extrinsic, evaluation metric for this task, Answering Performance for Evaluation of Summaries. APES utilizes recent progress in the field of reading-comprehension to quantify the ability of a summary to answer a set of manually created questions regarding central entities in the source article. We first analyze the strength of this metric by comparing it to known manual evaluation metrics. We then present an end-to-end neural abstractive model that maximizes APES, while increasing ROUGE scores to competitive results.

Translation (by gpt-3.5-turbo)

最近の自動要約と見出し生成の研究では、さまざまなニュースデータセットにおけるROUGEスコアの最大化に焦点を当てています。本研究では、このタスクのための代替的な外的評価指標である「要約の評価のための回答性能（APES）」を提案します。APESは、読解の分野の最近の進歩を活用して、要約がソース記事の中心的なエンティティに関する一連の手動作成質問に答える能力を定量化します。まず、この指標の強さを既知の手動評価指標と比較することで分析します。次に、APESを最大化するエンドツーエンドのニューラル抽象モデルを提案し、同時にROUGEスコアを競争力のある結果に向上させます。
Summary (by gpt-3.5-turbo)
最近の自動要約の研究では、ROUGEスコアの最大化に焦点を当てているが、本研究では代替的な評価指標であるAPESを提案する。APESは、要約が一連の手動作成質問に答える能力を定量化する。APESを最大化するエンドツーエンドのニューラル抽象モデルを提案し、ROUGEスコアを向上させる。

AkihikoWatanabe commented 1 year ago

APES

AkihikoWatanabe / paper_notes

Question answering as an automatic evaluation metric for news article summarization, Eyal+, NAACL'19 #995

Translation (by gpt-3.5-turbo)

Summary (by gpt-3.5-turbo)