ReaLLMASIC / nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.
MIT License
23 stars 17 forks source link

Add quantized krmsnorm #192

Closed klei22 closed 1 month ago

klei22 commented 2 months ago

This adds more options for kRMSNorm:

And a configuration sweep json to search through the settings space for krmsnorm including krmsnorm_num.

klei22 commented 2 months ago

image Uploading image of current training, running 'grid_search' sweep for all setting types from run_vizier.py

klei22 commented 2 months ago

Updates, also it appears that run_vizier doesn't deal well with a nan result in checkpoint, so using the run_experiments.py going forward and filing a bug.

image

klei22 commented 2 months ago

image Above shows that no qat has better validation loss, higher precision (int16) does better, and appears larger 'k' values also have a larger impact when quantized.