stacked / factorized convolutions

We can stack linear convolutions to reduce the number of parameters in a layer. This is most pressing to implement in the TconvLayers. We want to make factorized spatiotemporal convolutions. Convolve with spatial kernel first and then temporal kernel. Instead of doing a full 3D convolution.

Similarly, we can make bigger spatial kernels by stacking linear convolution layers. 3 5x5 2D convolutions in a row can mimic a 15x15 convolutional kernel, but with 75 parameters instead of 225.

I've seen this show up now in a number of papers now. Both on neural modeling and in the domain of video prediction / segmentation.

NeuroTheoryUMD / NDNT

stacked / factorized convolutions #7