ReaLLMASIC / nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.
MIT License
23 stars 17 forks source link

Add partial RMSnorm "pRMSNorm" variation #153

Closed gkielian closed 4 months ago

gkielian commented 4 months ago

This was described in the RMSNorm paper as being able to accomplish the same task as RMSNorm usually with only performing calculations on the first 6% of entries.

This is because the average of the RMSNorm changes more slowly the more items are added, and the RMSNorm authors noted that the tokens they measured had around the same value.

klei22 commented 4 months ago

Looks good