ReaLLMASIC / nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.
MIT License
23 stars 17 forks source link

Add scripts for testing out-of-distribution addition #158

Closed klei22 closed 4 months ago

klei22 commented 4 months ago

These are primarily scripts for testing out of distribution addition.

Transformers are bad at doing addition for lengths longer than what is in their training set.

This is an exploration into how we can get transformers to actually generalize with minimal context length.

Also added some fixes were added as well to help align softmax variations with the publication, these will help with testing to which combination of configurations does best for the length generalization.