bclarkson-code / Tricycle

Autograd to GPT-2 completely from scratch
104 stars 7 forks source link

Optimise Loss #57

Closed bclarkson-code closed 3 months ago

bclarkson-code commented 3 months ago

Currently, the loss function uses more memory and is slower than I would expect. It should be optimised