bclarkson-code / Tricycle

Autograd to GPT-2 completely from scratch
104 stars 7 forks source link

Add better debugging #37

Closed bclarkson-code closed 2 months ago

bclarkson-code commented 5 months ago

During an initial training run of SmolGPT, the model seemed to train for a bit and then plateau. During evaluation, I found that, regardless of what the model was fed, it would just output "MMMMMMMMMMMMM" repeatedly (i guess it was hungry?). There are several standard debugging steps that will need to be followed to figure out what is going wrong and these should be added to tricycle. Some include:

bclarkson-code commented 5 months ago

Added model size calculator in #42