Despite their remarkable capabilities, large language models (LLMs) oftenproduce responses containing factual inaccuracies due to their sole reliance onthe parametric knowledge they encapsulate. Retrieval-Augmented Generation(RAG), an ad hoc approach that augments LMs with retrieval of relevantknowledge, decreases such issues. However, indiscriminately retrieving andincorporating a fixed number of retrieved passages, regardless of whetherretrieval is necessary, or passages are relevant, diminishes LM versatility orcan lead to unhelpful response generation. We introduce a new framework calledSelf-Reflective Retrieval-Augmented Generation (Self-RAG) that enhances an LM'squality and factuality through retrieval and self-reflection. Our frameworktrains a single arbitrary LM that adaptively retrieves passages on-demand, andgenerates and reflects on retrieved passages and its own generations usingspecial tokens, called reflection tokens. Generating reflection tokens makesthe LM controllable during the inference phase, enabling it to tailor itsbehavior to diverse task requirements. Experiments show that Self-RAG (7B and13B parameters) significantly outperforms state-of-the-art LLMs andretrieval-augmented models on a diverse set of tasks. Specifically, Self-RAGoutperforms ChatGPT and retrieval-augmented Llama2-chat on Open-domain QA,reasoning and fact verification tasks, and it shows significant gains inimproving factuality and citation accuracy for long-form generations relativeto these models.
URL
Affiliations
Abstract
Translation (by gpt-3.5-turbo)
Summary (by gpt-3.5-turbo)