ekut-es / pico-cnn

Lightweight C implementation of CNNs for Embedded Systems
BSD 3-Clause "New" or "Revised" License
53 stars 7 forks source link

Slow inference time #5

Closed miklasr closed 4 years ago

miklasr commented 4 years ago

I have tried out this framework with an Alexnet ONNX model (obtained from here as is suggested in the README). My reason for using this, is to run ONNX models independent of the hardware platform, ideally with high inference speeds.

Setup

git clone https://github.com/ekut-es/pico-cnn
# Dependency for Ubuntu
sudo apt install libjpeg-dev

# Set up virtual environment and install requirements
conda create -n pico-cnn python=3.6.5
conda activate pico-cnn
cd pico-cnn/onnx_import
pip install -r requirements.txt

# Set up the ONNX model
wget https://github.com/onnx/models/blob/master/vision/classification/alexnet/model/bvlcalexnet-9.onnx?raw=true -O $DESTINATION/bvlcalexnet-9.onnx
python onnx_to_pico_cnn.py --input $DESTINATION/bvlcalexnet-9.onnx

cd generated_code/bvlcalexnet-9
make

Running

I used the dummy input program to test my installation and to get an idea of how quick this framework is. I used ./dummy_input network.weights.bin NUM_RUNS GENERATE_ONCE to do this. This took a surprisingly long time to execute, so I decided to investigate a little. I modified the dummy_input.c file to time how long the call to network() takes, which I have assumed is the call to run the inference. This took around 3s on average (I used 50 runs to get a mean inference time).

I have a similar testing setup with Microsoft's ONNX Runtime in Python for comparison, which gave mean inference times of 12ms.

Questions

  1. Is this long inference time to be expected with pico-cnn?
  2. Are there any options/flags that I forgot to set that would speed up inference?

Thanks in advance!

alexjung commented 4 years ago

Hello miklasr,

thank you for your question and interest in Pico-CNN.

The first way to improve inference performance would be to enable optimization during compilation (which has just been enabled by default a couple of days ago, so you might have an "older" version running right now). You can add any desired optimization flags in the Makefile of the Pico-CNN library (pico-cnn/pico-cnn/Makefile) as well as in the generated Makefile (pico-cnn/onnx_import/generated_code/bvlcalexnet-9/Makefile in the CFLAGS variable. For example we would suggest using -flto -O3 -march=native (this is also the default when using the latest commit).

Moreover, it is important to keep in mind that the current version of Pico-CNN only supports single-core CPU inference, whereas Microsoft's ONNX Runtime is actually C++ with Python bindings and probably uses at least multi-core and vectorization on your system. We are currently working on v2 of Pico-CNN which will incorporate the ARM Compute Library improving performance on ARM architectures and multi-core support on x86 architectures.

Finally, Pico-CNN is intended to be a very lightweight framework consisting of as few files and functions as possible. We use it in teaching and also in research as you don't have to worry about compatibility of third party libraries etc.

If you have any further questions feel free to ask!

miklasr commented 4 years ago

Many thanks for your response and taking the time to highlight reasons for the differing inference times. I would be interested to see how performance improves with a future version 2.