simplify the implementation of the kernel-based (conv3d.h) and toeplitz-based variations such that they can be unit tested outside the layer_t interface
try to write the output|gparam|ginput as a single matrix multiplication
apps/convnet: compute the optimum number of feature planes (such that the number of parameters per layer is approximatively constant), given the kernel size & connectivity per layer
bonus: extend convolution layer to support output planes that are not multiple of the connectivity factor
bonus: generalize the convolution layer to use a generic connectivity matrix
try to reduce the number of buffers for the toeplitz-based operator