bclarkson-code / Tricycle

Autograd to GPT-2 completely from scratch
104 stars 7 forks source link

Optimised cross entropy loss #60

Closed bclarkson-code closed 3 months ago

bclarkson-code commented 3 months ago

The cross entropy loss function has been greatly optimised leading to a 28x (!) improvement in speed.