Separate optimisation from gradient calculation.

At the moment, we have the gradient calculations interwoven with the optimisation algorithm inside runBackwards. This is a bit terrible, as it means we can't give users a choice of algorithm.

It would be better for runBackwards to return just the gradients, and have training take a (hopefully minibatched) optimisation strategy.

Allowing Nesterov and SGD (with momentum) should be trivial, adagrad and friends might need a rethink in how we keep track of training updates.

HuwCampbell / grenade

Separate optimisation from gradient calculation. #9