Retrieving documents and prepending them in-context at inference timeimproves performance of language model (LMs) on a wide range of tasks. However,these documents, often spanning hundreds of words, make inference substantiallymore expensive. We propose compressing the retrieved documents into textualsummaries prior to in-context integration. This not only reduces thecomputational costs but also relieves the burden of LMs to identify relevantinformation in long retrieved documents. We present two compressors -- anextractive compressor which selects useful sentences from retrieved documentsand an abstractive compressor which generates summaries by synthesizinginformation from multiple documents. Both compressors are trained to improveLMs' performance on end tasks when the generated summaries are prepended to theLMs' input, while keeping the summary concise.If the retrieved documents areirrelevant to the input or offer no additional information to LM, ourcompressor can return an empty string, implementing selective augmentation.Weevaluate our approach on language modeling task and open domain questionanswering task. We achieve a compression rate of as low as 6% with minimal lossin performance for both tasks, significantly outperforming the off-the-shelfsummarization models. We show that our compressors trained for one LM cantransfer to other LMs on the language modeling task and provide summarieslargely faithful to the retrieved documents.
URL
Affiliations
Abstract
Translation (by gpt-3.5-turbo)
Summary (by gpt-3.5-turbo)