At the moment, we have the gradient calculations interwoven with the optimisation algorithm inside runBackwards. This is a bit terrible, as it means we can't give users a choice of algorithm.
It would be better for runBackwards to return just the gradients, and have training take a (hopefully minibatched) optimisation strategy.
Allowing Nesterov and SGD (with momentum) should be trivial, adagrad and friends might need a rethink in how we keep track of training updates.
At the moment, we have the gradient calculations interwoven with the optimisation algorithm inside
runBackwards
. This is a bit terrible, as it means we can't give users a choice of algorithm.It would be better for
runBackwards
to return just the gradients, and have training take a (hopefully minibatched) optimisation strategy.Allowing Nesterov and SGD (with momentum) should be trivial, adagrad and friends might need a rethink in how we keep track of training updates.