Experiment with Fixed Global Tokens

AnswerDotAI / cold-compress

Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of GPT-Fast, a simple, PyTorch-native generation codebase.

https://www.answer.ai/posts/2024-08-01-cold-compress.html

BSD 3-Clause "New" or "Revised" License

85 stars 8 forks source link

Experiment with Fixed Global Tokens #5

Closed griff4692 closed 4 months ago

griff4692 commented 5 months ago

Efficient Streaming Language Models with Attention Sinks find that the first k=4 tokens have global importance – receive consistent attention regardless of semantic relevance to the particular prompt.

Explore this and other possible "global tokens" --> instructions, queries, etc.

griff4692 commented 4 months ago

Implemented and merged