ReaLLMASIC / nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.
MIT License
23 stars 17 forks source link

Add kan and hyperparams #182

Closed gkielian closed 2 months ago

gkielian commented 2 months ago

These combine the latest PRs towards adding ability to experiment with KAN as a replacement for linear layers and for the MLP, while retaining backwards compatibility with prior features.

A small change was needed for the current KAN implementation, requiring a clamp to prevent div by zero.

After implementing the above, inference with the sample.py file became much more stable.

klei22 commented 2 months ago

The wrapped nn.linear uses a different initialization, would it be possible to have this match the previous nn.Linear initialization values?

gkielian commented 2 months ago

Collaborator

Sounds good, will look into whether we can forward the initialization values from model.py

klei22 commented 2 months ago

image

I tested too, and seems actually like it should be fine, will make some direct edits and merge in.

klei22 commented 2 months ago

Merged in the latest repo changes and created a new PR with suggested adjustments:

https://github.com/ReaLLMASIC/nanoGPT/pull/189