Lightning-AI / litgpt

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
https://lightning.ai
Apache License 2.0
9.51k stars 948 forks source link

Mistral v0.1 sliding window attention #1552

Open rasbt opened 1 month ago

rasbt commented 1 month ago

Opening this issue so we don't forget: Once #1545 is merged, let's also add sliding window attention to Mistral 0.1

Andrei-Aksionov commented 1 month ago

One note: tests for Mistral model (regular model, adapter and LoRA) should also be updated to test functionality of a sliding window attention, meaning that the size of an input should be larger than the size of a sliding window. Otherwise, we will gate false positives. It can be adapted from the corresponding tests for Gemma 2.