Despite the recent progress in language generation models, their outputs may not always meet user expectations. In this work, we study whether informational feedback in natural language can be leveraged to improve generation quality and user preference alignment. To this end, we consider factual consistency in summarization, the quality that the summary should only contain information supported by the input documents, as the user-expected preference. We collect a high-quality dataset, DeFacto, containing human demonstrations and informational natural language feedback consisting of corrective instructions, edited summaries, and explanations with respect to the factual consistency of the summary. Using our dataset, we study three natural language generation tasks: (1) editing a summary by following the human feedback, (2) generating human feedback for editing the original summary, and (3) revising the initial summary to correct factual errors by generating both the human feedback and edited summary. We show that DeFacto can provide factually consistent human-edited summaries and further insights into summarization factual consistency thanks to its informational natural language feedback. We further demonstrate that fine-tuned language models can leverage our dataset to improve the summary factual consistency, while large language models lack the zero-shot learning ability in our proposed tasks that require controllable text generation.

Translation (by gpt-3.5-turbo)

最近の言語生成モデルの進歩にもかかわらず、その出力は常にユーザーの期待に応えるものではありません。本研究では、自然言語の情報フィードバックを活用して生成の品質とユーザーの好みの整合性を向上させることができるかどうかを調査します。そのために、要約における事実の整合性を考慮し、要約には入力ドキュメントでサポートされた情報のみが含まれるべきであるという品質をユーザーの期待とします。私たちは、人間のデモンストレーションと要約の事実の整合性に関する修正指示、編集された要約、および説明からなる情報的な自然言語フィードバックを含む高品質なデータセットDeFactoを収集しました。私たちは、このデータセットを使用して、次の3つの自然言語生成タスクを研究しました：(1) 人間のフィードバックに従って要約を編集する、(2) 元の要約を編集するための人間のフィードバックを生成する、(3) 初期の要約を修正して事実の誤りを修正するために、人間のフィードバックと編集された要約の両方を生成する。DeFactoは、情報的な自然言語フィードバックにより、事実の整合性を持つ人間による編集された要約と要約の事実の整合性に関するさらなる洞察を提供することができます。さらに、微調整された言語モデルは、私たちのデータセットを活用して要約の事実の整合性を向上させることができることを示しましたが、大規模な言語モデルは、制御可能なテキスト生成を必要とする私たちの提案されたタスクにおいてゼロショット学習能力を欠いています。
Summary (by gpt-3.5-turbo)
本研究では、自然言語の情報フィードバックを活用して要約の品質とユーザーの好みを向上させる方法を調査しました。DeFactoという高品質なデータセットを使用して、要約の編集や修正に関する自然言語生成タスクを研究しました。また、微調整された言語モデルを使用して要約の品質を向上させることも示しました。しかし、大規模な言語モデルは制御可能なテキスト生成には向いていないことがわかりました。

AkihikoWatanabe / paper_notes

On Improving Summarization Factual Consistency from Natural Language Feedback, ACL'23 #841

Translation (by gpt-3.5-turbo)

Summary (by gpt-3.5-turbo)