ctuning/ck-request-asplos18-mobilenets-tvm-arm

This repository contains experimental workflow and all related artifacts as portable, customizable and reusable Collective Knowledge components for image classification from the 1st ReQuEST tournament at ASPLOS'18 on reproducible SW/HW co-design of deep learning (speed, accuracy, energy, costs).

References

Title: Optimizing Deep Learning Workloads on ARM GPU with TVM
Authors: Lianmin Zheng, Tianqi Chen
ACM paper
ACM artifact
arXiv ReQuEST goals
ReQuEST submission and reviewing guidelines
ReQuEST workflows
ReQuEST scoreboard

Artifact check-list

Details: Link

Algorithm: image classification with ResNet-18, MobileNet and VGG16
Program: TVM/NNVM, ARM Compute Library, MXNet, OpenBLAS
Compilation: g++
Transformations:
Binary: will be compiled on a target platform
Data set: ImageNet 2012 validation (50,000 images)
Run-time environment: Linux with OpenCL
Hardware: Firefly-RK3399 with ARM Mali-T860MP4 or other boards with ARM Mali GPUs
Run-time state: set by our scripts
Execution: inference speed
Metrics: total execution time; top1/top5 accuracy over some (all) images from the data set
Output: classification result; execution time; accuracy
Experiments: benchmarking the inference speed of different backends on ImageNet (automated via CK command line)
How much disk space required (approximately)? ~4GB
How much time is needed to prepare workflow (approximately)? several hours (mainly native compilation of packages)
How much time is needed to complete experiments (approximately)? hours for full ImageNet accuracy validation (50000 images)
Publicly available?: Yes
Code license(s)?: MIT license
CK workflow framework used? Yes
CK workflow URL: https://github.com/ctuning/ck-request-asplos18-mobilenets-tvm-arm
CK results URL: https://github.com/ctuning/ck-request-asplos18-results-mobilenets-tvm-arm
Original artifact: https://github.com/merrymercy/tvm-mali

Installation

NB: The # sign means sudo.

Install global prerequisites (Ubuntu)

# sudo apt-get install libtinfo-dev

# pip install numpy scipy decorator matplotlib
or
# pip3 install numpy scipy decorator matplotlib

Minimal CK installation

The minimal installation requires:

Python 2.7 or 3.3+ (limitation is mainly due to unitests)
Git command line client.

You can install CK in your local user space as follows:

$ git clone http://github.com/ctuning/ck
$ export PATH=$PWD/ck/bin:$PATH
$ export PYTHONPATH=$PWD/ck:$PYTHONPATH

You can also install CK via PIP with sudo to avoid setting up environment variables yourself:

$ sudo pip install ck

Install this CK repository with all dependencies (other CK repos to reuse artifacts)

$ ck pull repo:ck-request-asplos18-mobilenets-tvm-arm

Install this CK workflow from the ACM Digital Library snapshot

It is possible to install and test the snapshot of this workflow from the ACM Digital Library without interfering with your current CK installation. Download related file "request-asplos18-artifact-?-ck-workflow.zip" to a temporary directory, unzip it and then execute the following commands:

$ . ./prepare_virtual_ck.sh
$ . ./start_virtual_ck.sh

All CK repositories will be installed in your current directory. You can now proceed with further evaluation as described below.

Detect and test OpenCL driver

$ ck detect platform.gpgpu --opencl

Install libBLAS

$ sudo apt-get install libblas*

To detect and register in ck :
```
ck detect soft:lib.blas
```
To check the environment:
```
$ ck show env --tags=blas,no-openblas
```

A possible output:

cfe1e23a4472bb1d linux-32 32 BLAS library api-3 32bits,blas,blas,cblas,host-os-linux-32,lib,no-openblas,target-os-linux-32,v0,v0.3

Install OpenBLAS

$ ck install package:lib-openblas-0.2.18-universal

If you want to test other openblas version:

$ ck list package:lib-openblas*

Install LaPack

$ ck install package:lib-lapack-3.4.2

Install or detect llvm/clang compiler v4+

$ ck install package --tags=compiler,llvm

Though above is the suggested method, you can also install llvm via apt and the detect it via CK.

# apt-get install llvm-4.0 clang-4.0
$ ck detect soft:compiler.llvm

Packages installation

ARM Compute Library

$ ck install package:lib-armcl-opencl-17.12  --env.USE_GRAPH=ON --env.USE_NEON=ON --env.USE_EMBEDDED_KERNELS=ON

To check/install other versions available via CK

$ ck list package:lib-armcl-opencl-* 
$ ck install package --tags=lib,armcl env.USE_GRAPH=ON --env.USE_NEON=ON --env.USE_EMBEDDED_KERNELS=ON

MXNet with OpenBLAS

$ ck install package:lib-mxnet-master-cpu --env.USE_F16C=0

NNVM / TVM

$ ck install package:lib-nnvm-tvm-master-opencl

Original benchmarking (no real classification)

ARM Compute Library client (OpenCL)

This program must be first compiled

$ ck compile program:request-armcl-inference

and then executed as follows:

$ ck run program:request-armcl-inference --cmd_key=all

You can also use "ck benchmark" command to automatically set CPU/GPU frequency to max, compile program, run it N times and perform statistical analysis on empirical characteristics:

$ ck benchmark program:request-armcl-inference --cmd_key=all

We validated results from the authors:

backend: ARMComputeLib-mali model: vgg16    conv_method: gemm   dtype: float32  cost: 1.6511
backend: ARMComputeLib-mali model: vgg16    conv_method: gemm   dtype: float16  cost: 0.976307
backend: ARMComputeLib-mali model: vgg16    conv_method: direct dtype: float32  cost: 3.99093
backend: ARMComputeLib-mali model: vgg16    conv_method: direct dtype: float16  cost: 1.61435
backend: ARMComputeLib-mali model: mobilenet    conv_method: gemm   dtype: float32  cost: 0.172009
backend: ARMComputeLib-mali model: mobilenet    conv_method: direct dtype: float32  cost: 0.174635

Extra info: CK program meta

MXNet with OpenBLAS client (CPU)

$ ck run program:request-mxnet-inference  --cmd_key=all
 or
$ ck benchmark program:request-mxnet-inference  --cmd_key=all

We validated results from the authors:

backend: MXNet+OpenBLAS model: resnet18 dtype: float32  cost:0.4145
backend: MXNet+OpenBLAS model: mobilenet    dtype: float32  cost:0.3408
backend: MXNet+OpenBLAS model: vgg16    dtype: float32  cost:3.1244

Extra info: CK program meta and code

NNVM/TVM client (OpenCL)

$ ck run program:request-tvm-nnvm-inference  --cmd_key=all 
 or
$ ck benchmark program:request-tvm-nnvm-inference  --cmd_key=all

We validated results from the authors:

backend: TVM-mali   model: vgg16    dtype: float32  cost:0.9599
backend: TVM-mali   model: vgg16    dtype: float16  cost:0.5688
backend: TVM-mali   model: resnet18 dtype: float32  cost:0.1748
backend: TVM-mali   model: resnet18 dtype: float16  cost:0.1122
backend: TVM-mali   model: mobilenet    dtype: float32  cost:0.0814
backend: TVM-mali   model: mobilenet    dtype: float16  cost:0.0525

Extra info: CK program meta and code

Real classification (time and accuracy)

Original benchmarking clients did not include real classification in this ReQuEST submission. We therefore provided code for real image classification for each of the above CK programs. This is also required to calculate model accuracy on all (or a subset of) ImageNet data set.

MXNet with OpenBLAS client (CPU)

You can benchmark classification using MXNet with OpenBLAS as follows:

$ ck benchmark program:request-mxnet-inference --cmd_key=classify

You can also install ImageNet data sets for model accuracy validation as follows:

$ ck install package:imagenet-2012-val
or
$ ck install package:imagenet-2012-val-min-resized

$ ck install package:imagenet-2012-aux

You can then run accuracy test as follows:

$ ck run program:request-mxnet-inference --cmd_key=test --env.STAT_REPEAT=1

You can find raw accuracy results (top1/top5) for several models here.

Extra info: CK program meta and code

NNVM/TVM client (OpenCL)

You can benchmark classification and test accuracy using TVM/NNVM as follows:

$ ck benchmark program:request-tvm-nnvm-inference --cmd_key=classify
$ ck run program:request-tvm-nnvm-inference --cmd_key=test --env.STAT_REPEAT=1

You can find raw accuracy results (top1/top5) for several models here.

Extra info: CK program meta and code

ARM Compute Library client (OpenCL)

ReQuEST promotes reusability of AI/ML workflows, packages and artifacts using CK framework. Since image classification using ArmCL was already implemented and shared using CK format and added to the ReQuEST scoreboard, we can simply reuse this workflow and compare against public results!

Please, follow this ReadME to reproduce ArmCL classification results on Firefly-RK3399!

Validated results

Validated experimental results were recorded and processed using the following scripts (we plan to automate it further for the future ReQuEST editions):

$ ck find script:benchmark-request-tvm-arm

They are now available in this CK repo and on the public ReQuEST scoreboard.

Reviewers

This workflow was converted to CK and validated by the following reviewers:

See accepted results on the live scoreboard

Link