Extending the context window of large language models (LLMs) is gettingpopular recently, while the solution of augmenting LLMs with retrieval hasexisted for years. The natural questions are: i) Retrieval-augmentation versuslong context window, which one is better for downstream tasks? ii) Can bothmethods be combined to get the best of both worlds? In this work, we answerthese questions by studying both solutions using two state-of-the-artpretrained LLMs, i.e., a proprietary 43B GPT and LLaMA2-70B. Perhapssurprisingly, we find that LLM with 4K context window using simpleretrieval-augmentation at generation can achieve comparable performance tofinetuned LLM with 16K context window via positional interpolation on longcontext tasks, while taking much less computation. More importantly, wedemonstrate that retrieval can significantly improve the performance of LLMsregardless of their extended context window sizes. Our best model,retrieval-augmented LLaMA2-70B with 32K context window, outperformsGPT-3.5-turbo-16k and Davinci003 in terms of average score on seven longcontext tasks including question answering and query-based summarization. Italso outperforms its non-retrieval LLaMA2-70B-32k baseline by a margin, whilebeing much faster at generation. Our study provides general insights on thechoice of retrieval-augmentation versus long context extension of LLM forpractitioners.
URL
Affiliations
Abstract
Translation (by gpt-3.5-turbo)
Summary (by gpt-3.5-turbo)