PygmalionAI / aphrodite-engine

Large-scale LLM inference engine
https://aphrodite.pygmalion.chat
GNU Affero General Public License v3.0
1.1k stars 121 forks source link

[Feature]: Sliding Window Attention? #637

Closed Abdulhanan535 closed 2 months ago

Abdulhanan535 commented 2 months ago

🚀 The feature, motivation and pitch

Will it be supported? just saw the new release you uploaded the post1 release.

Alternatives

.

Additional context

. For Gemma 2 models

AlpinDale commented 2 months ago

I think you mean Alternating Sliding Window Attention, which is used by gemma2 only. This requires a non-trivial change the attention and block manager code, so it'll be some time before I have the bandwidth to handle it. I'll be trying to work with the vLLM team to land this feature sometime soon.

For now, I recommend using tabbyAPI, as they use exllamav2 which has alternating SWA implemented.