bclarkson-code / Tricycle

Autograd to GPT-2 completely from scratch
104 stars 7 forks source link

49 static allocations #53

Closed bclarkson-code closed 3 months ago

bclarkson-code commented 3 months ago

Arrays are now statically allocated. This has allowed for numerous optimisations that lead to a dramatic speedup. Training smolGPT to a loss of 0.5 takes ~1000s. Which is bordering on pytorch speeds (a proper comparison to come later). Additionally, memory usage has been dropped considerably and training now requires around 5Gb.