bclarkson-code / Tricycle

Autograd to GPT-2 completely from scratch
104 stars 9 forks source link

Train gpt2 #79

Closed bclarkson-code closed 3 months ago

bclarkson-code commented 3 months ago

train_smol_gpt now successfully trains a 124M paramter GPT2 model. Note: we're choosing to train the model with a chinchilla optimal number of tokens (~2.5Bn) rather than the number of tokens used in the GPT2 paper (300Bn) so it doesn't take an entire year to train.