Retrieval Augmented Causal Generation

DeepMind demonstrated in their recent RETRO paper that augmenting a language model's input with text retrieved from a corpus allows it to learn to copy relevant passages instead of storing those in its weights. This text retrieval is another solution to the problem mentioned in #8 and doesn't involve modifying the model. Instead, RETRO first retrieves similar text using BERT embeddings and then feeds that text into the cross-attention of their model together with the original prompt. This way, the decoder of their T5-model is aware of similar texts without storing them in its weights.\ We could implement a similar architecture without cross attention (#44) by using only autoregressive language modelling and retrieving chunks using BERT (or our own) embeddings. It would even be possible to test this approach without retraining a model by simply retrieving relevant chunks and feeding them into the context of our model (instead of using padding tokens).\ This issue tracks the progress of the initial proof-of-concept, its benchmarks against the baseline and its overall progress.

HomebrewNLP / Olmax

Retrieval Augmented Causal Generation #45