Closed kindloaf closed 7 years ago
update: I just realized that I used a binary without NNPACK compiled previously. Now I used the right binary, but observed that the network with NNPACK is slower (the inference time is 3x compared to network without NNPACK). Any advice?
Hmm, it might be because NNPACK was not built with optimized flags - because it is C files, you might want to change the CMAKE_C_FLAGS in addition to CMAKE_CXX_FLAGS as you mentioned in the other issue - want to give that a shot?
Also, you might want to check if the default build script targets your platform: the default is -march=armv7-a -mfloat-abi=softfp -mfpu=neon
and if you have aarch64 you may want to change it.
hi @kindloaf, @Yangqing, could you tell me how to properly convert a pretrained model to nnpack enabled, I built an ios demo and manually set op.set_engine("NNPACK") for those Conv Layers, however, when I traced the function calls, I saw that the conv_op were still done by Eigen. Did I miss something? By the way, I used the intact predictor class in C++ context, which seemed to never switch to other engines, neither CUDNN nor NNPACK.
@power0341 Did you do the following? (1) compiled the binary with nnpack enabled (make sure the switch is on in CMakeLists.txt) (2) use the python script in this page to convert the pre-trained model to use "NNPACK" and "BLOCK" engine.
@kindloaf I did switch on the nnpack option in CMakeLists.txt. This led to two additional libraries "libCAFFE2_NNPACK.a" and "libCAFFE2_PTHREADPOOL.a" been created.
For the second step, which is the one confused me most, I have it done in C++ source code, specifically, I loaded the NetDef predict_net and set the engines of convolution layers to be "NNPACK", then I created the net as a Predictor object. When printing the predictor->def().DebugString(), we see that
arg {
name: "engine"
s: "NNPACK"
}
I believe this is equivalent to the python version. Any suggestion?
@power0341 From the debug string, it seems right to me. Not sure why it's still using Eigen...
@power0341 Hi, how did you set the engine to "NNPACK" in the c++ source code? I am trying to do this too. Could you share your code?
@kindloaf I also observed that the network with NNPACK is slower. Did you have any progress on that?
Hi @raininglixinyu I don't have any update on this.
@kindloaf @Yangqing @power0341 Hi, how did you set the engine to "NNPACK" in the c++ source code? I am trying to do this too. Could you share your code?
I missed option "--recursive" when I cloned the git. You should command "git clone --recursive https://github.com/caffe2/caffe2.git" to clone 3rd party gits in "https://github.com/caffe2/caffe2/tree/master/third_party".
Hope it works.
I can run inference code with NNPACK as engine, but how can I set the threadpool size of NNPACK which is default the total number of CPU's core? But I don't need the all CPU's cores use for test. What should I do for it?
I'm trying to run inference on Android with nnpack enabled. I did the following, but didn't notice any performance differences: (1) make sure USE_NNPACK is ON in CMakeLists.txt (2) take the .pb files from zoo model, and convert them to use NNPACK, according to this page. I saw that the init_net.pb wasn't changed, but predict_net.pb was changed. The inference time of bvlc_googlenet on my device was the same, before and after conversion. Am I missing something?