Closed miklasr closed 4 years ago
Hello miklasr,
thank you for your question and interest in Pico-CNN.
The first way to improve inference performance would be to enable optimization during compilation (which has just been enabled by default a couple of days ago, so you might have an "older" version running right now). You can add any desired optimization flags in the Makefile of the Pico-CNN library (pico-cnn/pico-cnn/Makefile
) as well as in the generated Makefile (pico-cnn/onnx_import/generated_code/bvlcalexnet-9/Makefile
in the CFLAGS
variable. For example we would suggest using -flto -O3 -march=native
(this is also the default when using the latest commit).
Moreover, it is important to keep in mind that the current version of Pico-CNN only supports single-core CPU inference, whereas Microsoft's ONNX Runtime is actually C++ with Python bindings and probably uses at least multi-core and vectorization on your system. We are currently working on v2 of Pico-CNN which will incorporate the ARM Compute Library improving performance on ARM architectures and multi-core support on x86 architectures.
Finally, Pico-CNN is intended to be a very lightweight framework consisting of as few files and functions as possible. We use it in teaching and also in research as you don't have to worry about compatibility of third party libraries etc.
If you have any further questions feel free to ask!
Many thanks for your response and taking the time to highlight reasons for the differing inference times. I would be interested to see how performance improves with a future version 2.
I have tried out this framework with an Alexnet ONNX model (obtained from here as is suggested in the README). My reason for using this, is to run ONNX models independent of the hardware platform, ideally with high inference speeds.
Setup
Running
I used the dummy input program to test my installation and to get an idea of how quick this framework is. I used
./dummy_input network.weights.bin NUM_RUNS GENERATE_ONCE
to do this. This took a surprisingly long time to execute, so I decided to investigate a little. I modified thedummy_input.c
file to time how long the call tonetwork()
takes, which I have assumed is the call to run the inference. This took around 3s on average (I used 50 runs to get a mean inference time).I have a similar testing setup with Microsoft's ONNX Runtime in Python for comparison, which gave mean inference times of 12ms.
Questions
Thanks in advance!