bclarkson-code / Tricycle

Autograd to GPT-2 completely from scratch
104 stars 7 forks source link

45 get gpt loss to decrease to 0 for single batch #46

Closed bclarkson-code closed 5 months ago

bclarkson-code commented 5 months ago

The experiment script now trains a model on shakespeare, although the loss seems to plateau at around 6. More work is neeeded