OpenAdaptAI / OpenAdapt

AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models
https://www.OpenAdapt.AI
MIT License
858 stars 113 forks source link

Explore Landmark Attention #221

Open abrichr opened 1 year ago

abrichr commented 1 year ago

https://github.com/epfml/landmark-attention

https://arxiv.org/abs/2305.16300

Our approach seamlessly integrates with specialized data structures and the system's memory hierarchy, enabling processing of arbitrarily long context lengths. We demonstrate that our method can obtain comparable performance with Transformer-XL while significantly reducing the number of retrieved tokens in each step. Finally, we show that fine-tuning LLaMA 7B with our method successfully extends its context length capacity up to 32k tokens, allowing for inference at the context lengths of GPT-4.

FFFiend commented 1 year ago

Models with Landmark Attention

abrichr commented 1 year ago

TODO: pick one or more models we want to deploy