ReaLLMASIC / nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.
MIT License
23 stars 17 forks source link

Add Softplus activation and update inspect script #162

Closed gkielian closed 3 months ago

gkielian commented 3 months ago

Softplus is continuous in it's derivative, making it compatible with Functional Interpolation style embeddings.

Adding as well Squareplus, however it appears while promising we haven't gotten it to be stable just yet.

Note, in future optimizations of these (and generally just convergence of squareplus) should create configuration settings for their hyperparameters including the softmax constant divisor.