HuwCampbell / grenade

Deep Learning in Haskell
BSD 2-Clause "Simplified" License
1.45k stars 84 forks source link

Minibatch type family #21

Closed HuwCampbell closed 7 years ago

HuwCampbell commented 7 years ago

Matrix-matrix multiplications are significantly faster than many matrix-vector ones. So minibatching over layers which do this is definitely worthwhile.

With improvements to how LSTM layers update, I think we could get a 20x speed increase for decent sized layers.

To do this, my thought is to have an injective type family for mini batches, and allow for either runForwards, or runBatchForwards (or both) to be written for each layer, with a default of each either lifting into a batch, or just running many in parallel sparks.

Biggest questions are, how to efficiently store the tapes (what is Tapes? this easier if I don't make the tape change, but I still think I should).

type family MiniBatch (n :: Nat) (s :: Shape) = (b :: Shape) | b -> n s where
  MiniBatch n ('D1 x) = ('D2 x n)
  MiniBatch n ('D2 x y) = ('D3 x y n)
  MiniBatch n ('D3 x y z) = ('D4 x y z n) | Vec n ('D3 x y n)
class UpdateLayer x => Layer x (i :: Shape) (o :: Shape) where
  ...
  runBatchForwards :: x -> S (MiniBatch n i)
    -> (Tapes n x i o, S (MiniBatch n o))
  runBatchBackwards :: x -> Tapes n x i o
    -> S (MiniBatch n o) -> S (MiniBatch n i)
HuwCampbell commented 7 years ago

I don't think this is the best way to go. Minibatching can be simpler using a type.

silky commented 6 years ago

did you end up doing anything with batches?

it's not totally clear to me how it should work

silky commented 6 years ago

i've done this:

type CppnNetMain batches
  = Network '[ Reshape
             , FullyConnected (batches*TotalDim) (batches*Hidden) , Tanh 
             , FullyConnected (batches*Hidden)   (batches*OutDim) , Logit
             ]
             [ 'D2 batches TotalDim , 'D1 (batches*TotalDim) 
             , 'D1 (batches*Hidden) , 'D1 (batches*Hidden)
             , 'D1 (batches*OutDim) , 'D1 (batches*OutDim)
             ]

but i don't think it's a good idea, because it seems slower?! (for larger batches)

-- edit: to be fair, i think the slowdown is actually due to concatenation that i'm doing after the forward pass (edit again: actually, i'm not so sure ...)

HuwCampbell commented 6 years ago

So the main benefit you'll get with minibatching is that matrix matrix multiplications are much faster than many matrix vector ones (with one per example).

Unfortunately just lengthening the vectors won't help, and indeed, what you've got there's not actually minibatching at all, as the examples are now non-linearly connected through the fully connected layer (whose matrix is now n^2 bigger).

For convolutional nets, where it's already matrix matrix multiplications under the covers, minibatching will probably buy you quite a bit less benefit.

silky commented 6 years ago

So what should I do if I want to get some kind of batching behaviour here? Can I do fully-connected on 2d things? No, right?