ReaLLMASIC / nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.
MIT License
24 stars 18 forks source link

RMSNorm Recompute #290

Open mmoffatt2 opened 1 month ago

mmoffatt2 commented 1 month ago

In order to fully emulate hardware, I am adding RMSNorm recompute, which moves the division of RMS until after multiplying by W. This means that for pre-ln, the division happens after the Q, K, and V linears for attention and after the up linear for the MLP.

I also added recompute quantization, as per https://drive.google.com/drive/u/1/folders/1tOjBEBoXytUgU7R95aqWI6deCwyPAWjl