elixir-nx / bumblebee

Pre-trained Neural Network models in Axon (+ 🤗 Models integration)
Apache License 2.0
1.26k stars 90 forks source link

Implement attention sliding window for Mistral #341

Closed jonatanklosko closed 4 months ago

jonatanklosko commented 4 months ago

Attention sliding window means that each token "attends to" N prior and N later tokens.