Closed Abdulhanan535 closed 2 months ago
I think you mean Alternating Sliding Window Attention, which is used by gemma2 only. This requires a non-trivial change the attention and block manager code, so it'll be some time before I have the bandwidth to handle it. I'll be trying to work with the vLLM team to land this feature sometime soon.
For now, I recommend using tabbyAPI, as they use exllamav2 which has alternating SWA implemented.
🚀 The feature, motivation and pitch
Will it be supported? just saw the new release you uploaded the post1 release.
Alternatives
.
Additional context
. For Gemma 2 models