Explore Landmark Attention

abrichr commented 1 year ago

https://github.com/epfml/landmark-attention

Our approach seamlessly integrates with specialized data structures and the system's memory hierarchy, enabling processing of arbitrarily long context lengths. We demonstrate that our method can obtain comparable performance with Transformer-XL while significantly reducing the number of retrieved tokens in each step. Finally, we show that fine-tuning LLaMA 7B with our method successfully extends its context length capacity up to 32k tokens, allowing for inference at the context lengths of GPT-4.

FFFiend commented 1 year ago

Models with Landmark Attention

abrichr commented 1 year ago

TODO: pick one or more models we want to deploy

OpenAdaptAI / OpenAdapt

Explore Landmark Attention #221