ReaLLMASIC / nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.
MIT License
23 stars 17 forks source link

Quest: Create a training efficient version of windowed attention for experimenting with long contexts #139

Open gkielian opened 7 months ago

gkielian commented 7 months ago

This should allow us to experiment with longer contexts by minimizing the memory size.