Maratyszcza / NNPACK

Acceleration package for neural networks on multi-core CPUs
BSD 2-Clause "Simplified" License
1.67k stars 317 forks source link

Performance not so good on armv7 cpu #46

Closed knsong closed 7 years ago

knsong commented 7 years ago

@Maratyszcza can you give me some hint about: in which cases nnpack may have a better performance compared with im2col+sgemm using openblas/eigen on armv7 cpu? I also got the similar result as @conansherry 's(my net architeture is: input 60x60, stack of conv5x5, conv1x1, conv3x3 etc, stride == 1 ) and I'm wondering why in details fast algorithms in NNPACK seems to be inferior to openblas/eigen in this case.

And how to understand your comment in issue #39

When the number of channels on the input to convolution is small, the operation is similar to outer product: it is intrinsically memory bound, and fast algorithms in NNPACK do not help with performance.

Why would fast algorithms in NNPACK be memory bound when the number of channels on the input to convolution is small and thus be inferior to openblas/eigen? I think in this case im2col+sgemm using openblas/eigen will also need to perform a sgemm operation similar to outer product and be memory bound, but it is faster. What slows down nnpack here?

I must have missed something and need to hack into nnpack more thoroughly. Anyway, any little advice will be of great help. Thanks.

Maratyszcza commented 7 years ago

NNPACK convolution (inference mode) performs best when:

Fast convolution in NNPACK consists of Fast Fourier/Winograd Transforms and GEMM-like operations. GEMM-like operations are compute-bound and FFT/WT are bandwidth-bound. When image size and number of channels is large, GEMM-like operations in NNPACK dominate the runtime, and because NNPACK overall does fewer FLOPs than direct/SGEMM-based convolution, performs better overall. When number of input channels is small, the GEMM-like operations in NNPACK are a small fraction of runtime, and algorithmic speedup on these parts is not enough to compensate the cost of transforms.

knsong commented 7 years ago

Thanks a lot for your answer. It' quite clear now.