HuwCampbell / grenade

Deep Learning in Haskell
BSD 2-Clause "Simplified" License
1.45k stars 84 forks source link

Separate optimisation from gradient calculation. #9

Closed HuwCampbell closed 7 years ago

HuwCampbell commented 8 years ago

At the moment, we have the gradient calculations interwoven with the optimisation algorithm inside runBackwards. This is a bit terrible, as it means we can't give users a choice of algorithm.

It would be better for runBackwards to return just the gradients, and have training take a (hopefully minibatched) optimisation strategy.

Allowing Nesterov and SGD (with momentum) should be trivial, adagrad and friends might need a rethink in how we keep track of training updates.