Text summarization aims at compressing long documents into a shorter form that conveys the most important parts of the original document. Despite increased interest in the community and notable research effort, progress on benchmark datasets has stagnated. We critically evaluate key ingredients of the current research setup: datasets, evaluation metrics, and models, and highlight three primary shortcomings: 1) automatically collected datasets leave the task underconstrained and may contain noise detrimental to training and evaluation, 2) current evaluation protocol is weakly correlated with human judgment and does not account for important characteristics such as factual correctness, 3) models overfit to layout biases of current datasets and offer limited diversity in their outputs.

Translation (by gpt-3.5-turbo)

テキスト要約は、長い文書を元の文書の最も重要な部分を伝えるように短く圧縮することを目指しています。コミュニティの関心の高まりと注目すべき研究の努力にもかかわらず、ベンチマークデータセットにおける進展は停滞しています。私たちは、現在の研究セットアップの主要な要素であるデータセット、評価指標、およびモデルを批判的に評価し、次の3つの主要な欠点を強調します：1）自動収集されたデータセットはタスクを不十分に制約し、トレーニングと評価に有害なノイズを含む可能性があります。2）現在の評価プロトコルは人間の判断と弱く相関し、事実の正確さなどの重要な特性を考慮していません。3）モデルは現在のデータセットのレイアウトバイアスに過適合し、出力の多様性が限られています。
Summary (by gpt-3.5-turbo)
テキスト要約の研究は進展が停滞しており、データセット、評価指標、モデルの3つの要素に問題があることが指摘されている。自動収集されたデータセットは制約が不十分であり、ノイズを含んでいる可能性がある。評価プロトコルは人間の判断と相関が弱く、重要な特性を考慮していない。モデルはデータセットのバイアスに過適合し、出力の多様性が限られている。

AkihikoWatanabe / paper_notes

Neural Text Summarization: A Critical Evaluation, Krysciski+ (w/ Richard Socher), EMNLP-IJCNLP'19 #996

Translation (by gpt-3.5-turbo)

Summary (by gpt-3.5-turbo)