Large pre-trained language models have been shown to store factual knowledgein their parameters, and achieve state-of-the-art results when fine-tuned ondownstream NLP tasks. However, their ability to access and precisely manipulateknowledge is still limited, and hence on knowledge-intensive tasks, theirperformance lags behind task-specific architectures. Additionally, providingprovenance for their decisions and updating their world knowledge remain openresearch problems. Pre-trained models with a differentiable access mechanism toexplicit non-parametric memory can overcome this issue, but have so far beenonly investigated for extractive downstream tasks. We explore a general-purposefine-tuning recipe for retrieval-augmented generation (RAG) -- models whichcombine pre-trained parametric and non-parametric memory for languagegeneration. We introduce RAG models where the parametric memory is apre-trained seq2seq model and the non-parametric memory is a dense vector indexof Wikipedia, accessed with a pre-trained neural retriever. We compare two RAGformulations, one which conditions on the same retrieved passages across thewhole generated sequence, the other can use different passages per token. Wefine-tune and evaluate our models on a wide range of knowledge-intensive NLPtasks and set the state-of-the-art on three open domain QA tasks, outperformingparametric seq2seq models and task-specific retrieve-and-extract architectures.For language generation tasks, we find that RAG models generate more specific,diverse and factual language than a state-of-the-art parametric-only seq2seqbaseline.
URL
Affiliations
Abstract
Translation (by gpt-3.5-turbo)
Summary (by gpt-3.5-turbo)