NNPACK is an acceleration package for neural network computations. NNPACK aims to provide high-performance implementations of convnet layers for multi-core CPUs.
NNPACK is not intended to be directly used by machine learning researchers; instead it provides low-level performance primitives to be leveraged by higher-level frameworks, such as Caffe, Torch, MXNet, Theano, Tensorflow, and Mocha.jl.
--backend=psimd
or --backend=scalar
configuration options, but for performance reasons it is not recommended for production use--backend=scalar
configuration option, but for performance reasons it is not recommended for production use.Fast convolution algorithms based on Fourier transform and Winograd transform.
Library | Caffe | NNPACK | NNPACK | NNPACK |
---|---|---|---|---|
Algorithm | im2col + sgemm | FFT-8x8 | FFT-16x16 | Winograd F(6x6, 3x3) |
AlexNet:conv2 | 315 ms | 129 ms | 86 ms | N/A |
AlexNet:conv3 | 182 ms | 87 ms | 44 ms | 70 ms |
AlexNet:conv4 | 264 ms | 109 ms | 56 ms | 89 ms |
AlexNet:conv5 | 177 ms | 77 ms | 40 ms | 64 ms |
VGG-A:conv1 | 255 ms | 303 ms | 260 ms | 404 ms |
VGG-A:conv2 | 902 ms | 369 ms | 267 ms | 372 ms |
VGG-A:conv3.1 | 566 ms | 308 ms | 185 ms | 279 ms |
VGG-A:conv3.2 | 1091 ms | 517 ms | 309 ms | 463 ms |
VGG-A:conv4.1 | 432 ms | 228 ms | 149 ms | 188 ms |
VGG-A:conv4.2 | 842 ms | 402 ms | 264 ms | 329 ms |
VGG-A:conv5 | 292 ms | 141 ms | 83 ms | 114 ms |
OverFeat:conv2 | 424 ms | 158 ms | 73 ms | N/A |
OverFeat:conv3 | 250 ms | 69 ms | 74 ms | 54 ms |
OverFeat:conv4 | 927 ms | 256 ms | 272 ms | 173 ms |
OverFeat:conv5 | 1832 ms | 466 ms | 524 ms | 315 ms |
Built-in expert-tuned kernels with very high performance:
Multi-threaded SIMD-aware implementations of neural network layers.
Implemented in C99 and Python without external dependencies.
Extensive unit tests using C++ and Google Test.
Supports Native Client target and outperforms native Caffe/CPU when running inside Chrome.
nnp_convolution_output
)nnp_convolution_input_gradient
)nnp_convolution_kernel_gradient
)nnp_convolution_inference
)nnp_fully_connected_output
)nnp_fully_connected_inference
)nnp_max_pooling_output
)nnp_relu_output
)nnp_relu_input_gradient
)nnp_softmax_output
)NNPACK can be build on OS X and Linux.
Install ninja build system
sudo apt-get install ninja-build || brew install ninja
Install PeachPy assembler and confu configuration system
[sudo] pip install --upgrade git+https://github.com/Maratyszcza/PeachPy
[sudo] pip install --upgrade git+https://github.com/Maratyszcza/confu
Then clone NNPACK, install dependencies, configure, and build
git clone https://github.com/Maratyszcza/NNPACK.git
cd NNPACK
confu setup
python ./configure.py
ninja
ndk-build
to PATH
variableconfu setup
)ndk-build
build system.emsdk
, download, build and activate incoming
version of Emscripten and Binaryen, and setup environment variables. $EMSCRIPTEN
should specify the path to activated Emscripten environment.--target=wasm
option.emsdk
, download, build and activate one of the environments, and setup environment variables. $EMSCRIPTEN
should specify the path to activated Emscripten environment.--target=asmjs
option.NACL_SDK_ROOT
variable to a versioned SDK directory (e.g. /opt/nacl_sdk/pepper_49
).--target=pnacl
option.NACL_SDK_ROOT
variable to a versioned SDK directory (e.g. /opt/nacl_sdk/pepper_49
).--target=x86_64-nacl-newlib
(recommended) or --target=x86_64-nacl-gnu
option.NNPACK contains extensive test suite for transformation and neural network layers.
After configuration type ninja smoketest
to run a set of quick tests, or ninja test
to additionally NNPACK layers with parameters from AlexNet, VGG-A, and Overfeat-Fast networks (this will take a while).
Binary packages need to distribute two files: include/nnpack.h
and lib/libnnpack.a
(also lib/libnnpack.so
or lib/libnnpack.dylib
if NNPACK was configured with shared library support).
nnpack-pr
branch in ajtulloch/caffe.The library is developed by Marat Dukhan of Georgia Tech with extensive advice from Nicolas Vasilache and Soumith Chintala of Facebook Artificial Intelligence Research. Andrew Tulloch of Facebook Artificial Intelligence Research contributed Caffe integration. We thank Andrew Lavin for fruitful discussions on Winograd transform-based implementations. NNPACK is a research project at Richard Vuduc's HPC Garage lab in the Georgia Institute of Technology, College of Computing, School of Computational Science and Engineering.
This material is based upon work supported by the U.S. National Science Foundation (NSF) Award Number 1339745. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect those of NSF.