bclarkson-code / Tricycle

Autograd to GPT-2 completely from scratch
104 stars 7 forks source link

Get GPT loss to decrease to 0 for single batch #45

Closed bclarkson-code closed 5 months ago

bclarkson-code commented 5 months ago

To make sure that everything is working, we should be able to drop the loss to 0 on a single batch for the model. If it doesn't then there are some bugs that need fixing