This paper presents Z-Code++, a new pre-trained language model optimized for abstractive text summarization. The model extends the state-of-the-art encoder-decoder model using three techniques. First, we use a two-phase pre-training to improve the model's performance on low-resource summarization tasks. The model is first pre-trained using text corpora for language understanding, then is continually pre-trained on summarization corpora for grounded text generation. Second, we replace self-attention layers in the encoder with disentangled attention layers, where each word is represented using two vectors that encode its content and position, respectively. Third, we use fusion-in-encoder, a simple yet effective method of encoding long sequences in a hierarchical manner. Z-Code++ creates a new state-of-the-art on 9 of 13 text summarization tasks across 5 languages. Our model is parameter-efficient in that it outperforms the 600x larger PaLM540B on XSum, and the finetuned 200x larger GPT3175B on SAMSum. In zero-shot and few-shot settings, our model substantially outperforms the competing models.

Translation (by gpt-3.5-turbo)

この論文では、抽象的なテキスト要約に最適化された新しい事前学習言語モデルであるZ-Code++を提案します。このモデルは、エンコーダーデコーダーモデルの最先端を拡張するために3つの技術を使用しています。まず、低リソースの要約タスクでのモデルのパフォーマンスを向上させるために、2つのフェーズの事前学習を使用しています。モデルはまず、言語理解のためのテキストコーパスで事前学習され、次に要約コーパスで接地されたテキスト生成のために継続的に事前学習されます。次に、エンコーダーのセルフアテンション層をディセントラル化アテンション層に置き換えます。ここでは、各単語はその内容と位置をエンコードする2つのベクトルを使用して表現されます。さらに、階層的な方法で長いシーケンスをエンコードするためのシンプルで効果的な方法であるエンコーダー内のフュージョンを使用します。Z-Code++は、5つの言語で13のテキスト要約タスクのうち9つで最先端の性能を発揮します。また、当モデルはパラメータ効率的であり、XSumで600倍大きなPaLM540Bを上回り、SAMSumで200倍大きなGPT3175Bを上回ります。ゼロショットおよびフューショットの設定では、当モデルは競合モデルを大幅に上回ります。
Summary (by gpt-3.5-turbo)
この論文では、新しい事前学習言語モデルであるZ-Code++を提案し、抽象的なテキスト要約に最適化されています。Z-Code++は、2つのフェーズの事前学習とディセントラル化アテンション層、およびエンコーダー内のフュージョンを使用しています。このモデルは、低リソースの要約タスクで最先端の性能を発揮し、パラメータ効率的であり、他の競合モデルを大幅に上回ります。

AkihikoWatanabe / paper_notes

Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization, ACL'23 #816

Translation (by gpt-3.5-turbo)

Summary (by gpt-3.5-turbo)