some other variation for convolution networks without padding: 2 convolution layers can be combined with a convolution layer with the mask size the sum of the mask sizes of those 2 layers. For example: (3x3 + act + 5x5) can be combined with (7x7)
the combination can be generalized: plus (like in the residual networks), cat4 (concatenation of the feature maps), weighting...