Closed griff4692 closed 4 months ago
Efficient Streaming Language Models with Attention Sinks find that the first k=4 tokens have global importance – receive consistent attention regardless of semantic relevance to the particular prompt.
Explore this and other possible "global tokens" --> instructions, queries, etc.
Implemented and merged
Efficient Streaming Language Models with Attention Sinks find that the first k=4 tokens have global importance – receive consistent attention regardless of semantic relevance to the particular prompt.
Explore this and other possible "global tokens" --> instructions, queries, etc.