Goal here is to take learning out of back prop. Interestingly, I'm now returning a Gradient type from runBackwards, which is often () for layers like logit and tanh.
This means we don't have to shuffle around multiple sets of momentums, there can be just one in the layer.
Step one in getting it ready for parallel execution.
Goal here is to take learning out of back prop. Interestingly, I'm now returning a Gradient type from runBackwards, which is often
()
for layers like logit and tanh.This means we don't have to shuffle around multiple sets of momentums, there can be just one in the layer.
Step one in getting it ready for parallel execution.