Closed HuwCampbell closed 7 years ago
I don't think this is the best way to go. Minibatching can be simpler using a type.
did you end up doing anything with batches?
it's not totally clear to me how it should work
i've done this:
type CppnNetMain batches
= Network '[ Reshape
, FullyConnected (batches*TotalDim) (batches*Hidden) , Tanh
, FullyConnected (batches*Hidden) (batches*OutDim) , Logit
]
[ 'D2 batches TotalDim , 'D1 (batches*TotalDim)
, 'D1 (batches*Hidden) , 'D1 (batches*Hidden)
, 'D1 (batches*OutDim) , 'D1 (batches*OutDim)
]
but i don't think it's a good idea, because it seems slower?! (for larger batches)
-- edit: to be fair, i think the slowdown is actually due to concatenation that i'm doing after the forward pass (edit again: actually, i'm not so sure ...)
So the main benefit you'll get with minibatching is that matrix matrix multiplications are much faster than many matrix vector ones (with one per example).
Unfortunately just lengthening the vectors won't help, and indeed, what you've got there's not actually minibatching at all, as the examples are now non-linearly connected through the fully connected layer (whose matrix is now n^2 bigger).
For convolutional nets, where it's already matrix matrix multiplications under the covers, minibatching will probably buy you quite a bit less benefit.
So what should I do if I want to get some kind of batching behaviour here? Can I do fully-connected on 2d things? No, right?
Matrix-matrix multiplications are significantly faster than many matrix-vector ones. So minibatching over layers which do this is definitely worthwhile.
With improvements to how LSTM layers update, I think we could get a 20x speed increase for decent sized layers.
To do this, my thought is to have an injective type family for mini batches, and allow for either runForwards, or runBatchForwards (or both) to be written for each layer, with a default of each either lifting into a batch, or just running many in parallel sparks.
Biggest questions are, how to efficiently store the tapes (what is
Tapes
? this easier if I don't make the tape change, but I still think I should).