We can stack linear convolutions to reduce the number of parameters in a layer. This is most pressing to implement in the TconvLayers. We want to make factorized spatiotemporal convolutions. Convolve with spatial kernel first and then temporal kernel. Instead of doing a full 3D convolution.
Similarly, we can make bigger spatial kernels by stacking linear convolution layers.
3 5x5 2D convolutions in a row can mimic a 15x15 convolutional kernel, but with 75 parameters instead of 225.
I've seen this show up now in a number of papers now. Both on neural modeling and in the domain of video prediction / segmentation.
We can stack linear convolutions to reduce the number of parameters in a layer. This is most pressing to implement in the TconvLayers. We want to make factorized spatiotemporal convolutions. Convolve with spatial kernel first and then temporal kernel. Instead of doing a full 3D convolution.
Similarly, we can make bigger spatial kernels by stacking linear convolution layers. 3 5x5 2D convolutions in a row can mimic a 15x15 convolutional kernel, but with 75 parameters instead of 225.
I've seen this show up now in a number of papers now. Both on neural modeling and in the domain of video prediction / segmentation.