Unstructured sparse matrix operations are not typically faster than dense matrix operations until a very high level of sparsity.
Our model structure puts nearly all of the computation into dense 1×1 convolutions. This can be implemented with highly optimized general matrix multiply (GEMM) functions. Often convolutions are implemented by a GEMM but require an initial reordering in memory called im2col in order to map it to a GEMM. For instance, this approach is used in the popular Caffe package [15].
1×1 convolutions do not require this reordering in memory and can be implemented directly with GEMM which is one of the most optimized numerical linear algebra algorithms.
MobileNet spends 95% of it’s computation time in 1 × 1 convolutions which also has 75% of the parameters.
So I wann ask, does this repo accomplish with highly optimized GEMM?
Besides, I wann ask: does this need im2col during mobilenet? before normal and depth-wise conv
Thanks!
The content below is from orginal paper:
So I wann ask, does this repo accomplish with highly optimized GEMM? Besides, I wann ask: does this need
im2col
during mobilenet? before normal and depth-wise conv Thanks!