bclarkson-code / Tricycle

Autograd to GPT-2 completely from scratch
107 stars 9 forks source link

Add optimiser #9

Closed bclarkson-code closed 9 months ago

bclarkson-code commented 9 months ago

Added Stochastic Gradient Descent optimiser with weight decay and momentum