AkihikoWatanabe commented 1 year ago

Bidirectional Encoder Representations from Transformers (BERT) represents the latest incarnation of pretrained language models which have recently advanced a wide range of natural language processing tasks. In this paper, we showcase how BERT can be usefully applied in text summarization and propose a general framework for both extractive and abstractive models. We introduce a novel document-level encoder based on BERT which is able to express the semantics of a document and obtain representations for its sentences. Our extractive model is built on top of this encoder by stacking several inter-sentence Transformer layers. For abstractive summarization, we propose a new fine-tuning schedule which adopts different optimizers for the encoder and the decoder as a means of alleviating the mismatch between the two (the former is pretrained while the latter is not). We also demonstrate that a two-staged fine-tuning approach can further boost the quality of the generated summaries. Experiments on three datasets show that our model achieves state-of-the-art results across the board in both extractive and abstractive settings.

Translation (by gpt-3.5-turbo)

Bidirectional Encoder Representations from Transformers（BERT）は、最新の事前学習言語モデルの最新版であり、最近、さまざまな自然言語処理タスクを進化させてきました。本論文では、BERTがテキスト要約にどのように有用に適用できるかを紹介し、抽出型および抽象型モデルのための一般的なフレームワークを提案します。私たちは、BERTに基づいた新しいドキュメントレベルのエンコーダを導入し、ドキュメントの意味を表現し、文の表現を取得することができます。抽出型モデルは、このエンコーダの上にいくつかの文間トランスフォーマーレイヤを積み重ねることで構築されます。抽象的な要約については、エンコーダとデコーダの間の不一致を緩和する手段として、エンコーダとデコーダに異なる最適化手法を採用する新しいファインチューニングスケジュールを提案します（前者は事前学習されているが、後者はそうではない）。また、2段階のファインチューニングアプローチによって生成された要約の品質をさらに向上させることも示します。3つのデータセットでの実験結果は、私たちのモデルが抽出型および抽象型の両方の設定で全体的に最先端の結果を達成していることを示しています。
Summary (by gpt-3.5-turbo)
本研究では、最新の事前学習言語モデルであるBERTを使用して、テキスト要約のための一般的なフレームワークを提案します。抽出型モデルでは、新しいエンコーダを導入し、文の表現を取得します。抽象的な要約については、エンコーダとデコーダの最適化手法を異ならせることで不一致を緩和します。さらに、2段階のファインチューニングアプローチによって要約の品質を向上させました。実験結果は、提案手法が最先端の結果を達成していることを示しています。

AkihikoWatanabe commented 1 year ago

BERTSUMEXT論文

AkihikoWatanabe commented 1 year ago

通常のBERTの構造と比較して、文ごとの先頭に[CLS]トークンを挿入し、かつSegment Embeddingsを文ごとに交互に変更することで、文のrepresentationを取得できるようにする。その後、encodingされたsentenceの[CLS]トークンに対応するembeddingの上に、inter-sentence Transformer layerを重ね、sigmoidでスコアリングするのが、BERTSUMEXT, Abstractiveの場合は6-layerのTransformer decoderを利用するが、これはスクラッチでfinetuninigさせる。このとき、encoder側はoverfit, decoder側はunderfitすることが予想されるため、encoderとdecodeで異なるwarmup, 学習率を適用する。具体的には、encoder側はより小さい学習率で、さらにsmoothに減衰するようにする。これにより、decoder側が安定したときにより正確な勾配で学習できるようになる。また、2-stageのfinetuningを提案し、まずencoder側をextractifve summarization taskでfinetuningし、その後abstractive summarizationでfinetuningする。先行研究ではextractive summarizationのobjectiveを取り入れることでabstractive summarizationの性能が向上していることが報告されており、この知見を取り入れる。今回はextractive summarizationの重みをabstractive taskにtrasnferすることになる。

AkihikoWatanabe / paper_notes

Text Summarization with Pretrained Encoders, Liu+ (with Lapata), EMNLP-IJCNLP'19 #1022

Translation (by gpt-3.5-turbo)

Summary (by gpt-3.5-turbo)