ReaLLMASIC / nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.
MIT License
23 stars 17 forks source link

replace linear with KAN #173

Closed SenmiaoORZ closed 2 months ago

gkielian commented 3 months ago

I have been testing out the KAN module, and seems to work better as an MLP replacement than a nn.Linear replacement.

Haven't been able to get stable inference yet with the nn.Linear replacement, but the inference with MLP works well until the model begins overfitting.

Let's hold on adding to the nn.Linear until we can confirm that the inference works, we can speak more on Tuesday about the next steps.

That it does appear to work for the MLP is awesome, and looking forward to beginning discussion of next steps.

gkielian commented 2 months ago

@SenmiaoORZ

I just finished making some adjustments, more details in: https://github.com/SenmiaoORZ/nanoGPT/pull/1/

In essence, rewrote the Kalnet just little bit to allow for beter polymorphism between existing linear variations, to add argparse param variables for base activation type and polnomial order and other KAN features, and a stability enhancement for forward pass and inference allowing us to use sample.py (and we can probably try again with the attention replacement now too, I think).

gkielian commented 2 months ago

Moving work to: https://github.com/ReaLLMASIC/nanoGPT/pull/182 for directly merging latest changes to the repo