URL

https://arxiv.org/abs/2005.11401
Affiliations
- Patrick Lewis, N/A
- Ethan Perez, N/A
- Aleksandra Piktus, N/A
- Fabio Petroni, N/A
- Vladimir Karpukhin, N/A
- Naman Goyal, N/A
- Heinrich Küttler, N/A
- Mike Lewis, N/A
- Wen-tau Yih, N/A
- Tim Rocktäschel, N/A
- Sebastian Riedel, N/A
- Douwe Kiela, N/A
  Abstract
- Large pre-trained language models have been shown to store factual knowledgein their parameters, and achieve state-of-the-art results when fine-tuned ondownstream NLP tasks. However, their ability to access and precisely manipulateknowledge is still limited, and hence on knowledge-intensive tasks, theirperformance lags behind task-specific architectures. Additionally, providingprovenance for their decisions and updating their world knowledge remain openresearch problems. Pre-trained models with a differentiable access mechanism toexplicit non-parametric memory can overcome this issue, but have so far beenonly investigated for extractive downstream tasks. We explore a general-purposefine-tuning recipe for retrieval-augmented generation (RAG) -- models whichcombine pre-trained parametric and non-parametric memory for languagegeneration. We introduce RAG models where the parametric memory is apre-trained seq2seq model and the non-parametric memory is a dense vector indexof Wikipedia, accessed with a pre-trained neural retriever. We compare two RAGformulations, one which conditions on the same retrieved passages across thewhole generated sequence, the other can use different passages per token. Wefine-tune and evaluate our models on a wide range of knowledge-intensive NLPtasks and set the state-of-the-art on three open domain QA tasks, outperformingparametric seq2seq models and task-specific retrieve-and-extract architectures.For language generation tasks, we find that RAG models generate more specific,diverse and factual language than a state-of-the-art parametric-only seq2seqbaseline.
  Translation (by gpt-3.5-turbo)
大規模な事前学習言語モデルは、そのパラメータに事実知識を格納し、下流の自然言語処理タスクでの微調整時に最先端の結果を達成することが示されています。しかし、知識へのアクセスと正確な操作能力はまだ限定されており、知識集約的なタスクでは、タスク固有のアーキテクチャに比べて性能が劣っています。さらに、意思決定の根拠を提供し、世界の知識を更新することは、未解決の研究課題です。パラメータ化されたメモリへの微分可能なアクセスメカニズムを持つ事前学習モデルは、この問題を克服することができますが、これまでのところ抽出型の下流タスクについてのみ調査されてきました。本研究では、検索強化生成（RAG）のための汎用的な微調整手法を探求します。RAGモデルは、事前学習されたパラメトリックメモリと非パラメトリックメモリを組み合わせた言語生成モデルです。パラメトリックメモリは事前学習されたseq2seqモデルであり、非パラメトリックメモリはWikipediaの密なベクトルインデックスであり、事前学習されたニューラルリトリーバーでアクセスされます。私たちは、2つのRAGモデルの形式を比較しました。1つは生成されたシーケンス全体で同じ検索されたパッセージに依存し、もう1つはトークンごとに異なるパッセージを使用することができます。私たちは、幅広い知識集約的な自然言語処理タスクでモデルを微調整し、評価しました。その結果、3つのオープンドメインのQAタスクで最先端の性能を発揮し、パラメトリックなseq2seqモデルやタスク固有のリトリーブ・エクストラクトアーキテクチャを上回りました。言語生成タスクでは、RAGモデルが最先端のパラメトリックなseq2seqベースラインよりも具体的で多様で事実に基づいた言語を生成することがわかりました。
Summary (by gpt-3.5-turbo)
大規模な事前学習言語モデルを使用した検索強化生成（RAG）の微調整手法を提案しました。RAGモデルは、パラメトリックメモリと非パラメトリックメモリを組み合わせた言語生成モデルであり、幅広い知識集約的な自然言語処理タスクで最先端の性能を発揮しました。特に、QAタスクでは他のモデルを上回り、言語生成タスクでは具体的で多様な言語を生成することができました。

AkihikoWatanabe / paper_notes

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, Patrick Lewis+, N/A, arXiv'20 #1168

URL

Affiliations

Abstract

Translation (by gpt-3.5-turbo)

Summary (by gpt-3.5-turbo)