ReaLLMASIC / nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.
MIT License
23 stars 17 forks source link

Add polymax with relu2 forward pass (PolymaxQuan) #159

Closed gkielian closed 3 months ago

gkielian commented 4 months ago

While in this manifestation this is not a full quantization, taking it in steps, first testing if we can simply replace polymax with relu2 in the forward pass, and results from below suggests that -- at least in postnorm -- that small effect (in fact preliminary results show a slight improvement):

This is with postnorm, but we will have results from pre-norm as well in a few minutes: image

Polymax has a tail on the left which is difficult when aggressively quantizing (very high precision needed, so is difficult to capture in int8 or int specifically).

We'll have to continue testing on different types of datasets, but this is a strong indicator that at least the relu^2 part of the quantization is stable, and we can continue iterating to add quantization argparse options for the forward pass.

gkielian commented 4 months ago

image

Seems that it works for both post and prenorm, but that it does work best with rotary embeddings.