ReaLLMASIC / nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.
MIT License
23 stars 17 forks source link

Softmax formulation pass #166

Closed gkielian closed 3 months ago

gkielian commented 3 months ago

Change Summary:

  1. Reverting a base change for Softmax introduced in a Jan 28 commit, that was missed during review.

Thanks Karthik for finding this! : )

  1. Cleaned up the Polymax Section

a. removed PolymaxQuan -- should adhere to the ConSmaxQuan format b. moved existing polymax to vpolymax -- keeping around since it is interesting that the shape worked c. optimized the polymax version to minimize calculations per forward pass

  1. Added boolean option to each relevant variation for dividing by the sequence length.

We just saw this paper two days ago, and seems there are some overlaps and hoping to see if similar strategy could work for language modeling targeted transformers:

https://arxiv.org/abs/2309.08586

klei22 commented 3 months ago

Looks good