XiaoMi / mace

MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.
Apache License 2.0
4.93k stars 817 forks source link

Support cross compilation for ARM Linux #36

Closed llhe closed 5 years ago

llhe commented 6 years ago

有需求的可以在这里提一下。顺便提供一下:

  1. 芯片类型和参数
  2. Linux版本
  3. 交叉编译器版本
  4. 如果支持GPU的话支持提供一下OpenCL的版本和动态库的路径(32/64位)
zhy520xp commented 6 years ago

交叉编译的话,不需要提供芯片类型和参数吧。按理说,只要交叉编译链支持c++11(MACE用了c++11),只需要提供一个如何进行交叉编译的教程就行了额。 再说了,芯片的种类很多的,每种芯片的交叉编译链都不一样呢。。。

leogift commented 6 years ago

感谢mace这么赞的工作。arm linux需求巨大,支持armv8/aarch64加mali就很通用了。一个栗子 1.rk3399,双a72+4个a53 2.ubuntu16.04 3.aarch64-linux-g** 4.opencl 1.2

再次感谢

llhe commented 6 years ago

@zhy520xp 需要提供一个bazel toolchain的一个guide,或者你如果跑通了可以反馈一下。

另外需要确认OpenCL的支持情况以及兼容的可行性,因为不同系统差别较大。我们没有开发环境,所以希望收集一下信息,同样如果你跑通了,也请反馈一下,非常感谢。

colorfulCloud commented 6 years ago

as @leogift mentioned, rk3399 is widely used in industry

xiaqing10 commented 6 years ago

确实呀, RK3399,这个真的是需要的

xiaqing10 commented 6 years ago

https://github.com/zhy520xp/mace-makefile-project 做好了交叉编译,可以参考参考

zhy520xp commented 6 years ago

像3559A和3536这类嵌入式平台,如何编译出能跑gpu版本的东西,是否应该支持一下。。

zhy520xp commented 6 years ago

https://github.com/zhy520xp/mace-makefile-project已经可以把3288和3399的gpu跑起来了

hbwangjinwu commented 6 years ago

现在bazel 编译卡住在了 protobuf 的编译问题上。 mace 编译会调用protobuf 交叉编译出来的工具,这样就导致了执行格式错误。

hbwangjinwu commented 6 years ago

我用bazel 工具在arm linux 上编译出了mace 库,但是跑起来时间很长,请问bazel 规则中的neon是默认enable的吗?

hbwangjinwu commented 6 years ago

@llhe 目前基本已经整理出了bazel 编译的方法,不过我在3516D 上跑的时间不理想。可以把前面整理的方法提供出来

llhe commented 6 years ago

@hbwangjinwu 赞,可以发一个PR吗?需要显式打开,可以参考此处宏定义:https://github.com/XiaoMi/mace/blob/master/mace/kernels/arm/conv_2d_neon_15x1.cc#L76

hbwangjinwu commented 6 years ago

https://github.com/hbwangjinwu/mace_cross_compile_guide guide 和我的交叉编译设定 @llhe 还不清楚如何用bazel 规则打开 -DMACE_ENABLE_NEON

llhe commented 6 years ago

bazel 命令加上 --define neon=true

madhavajay commented 6 years ago

What about the ASUS TinkerBoard with RK3288?

madhavajay commented 6 years ago

I got it working on the TinkerBoard and made a fork of the Makefile repo with English instructions: https://github.com/madhavajay/mace-makefile-project

https://github.com/XiaoMi/mace/issues/167

shuxiao9058 commented 6 years ago

@madhavajay 你好,试着运行了mace-makefile-project的demo程序,但是运行结果为什么CPU反而比GPU还要快?

zhy520xp commented 6 years ago

@shuxiao9058 一般来讲,移动端GPU和CPU跑同一个模型,GPU是比CPU慢一些。比如3288的GPU性能就只有CPU的一半。这其实和平台很相关的,具体来讲就是你的CPU是啥型号,GPU是啥型号,同时GPU核心数也是很重要的,核心越多越快。比如同样是Mali-G71,4个核心就比2个核心快很多

madhavajay commented 6 years ago

@shuxiao9058 @zhy520xp I don't know why the performance is as it is, perhaps i can improve with different compile flags? However 6 fps is much better than 0.8-1 fps on a RPi 3, so I think the acceleration of MACE is fairly impressive. Any suggestion on faster improvement would be appreciated. I read Winograd is theoretically 2.2x faster?

zhy520xp commented 6 years ago

@madhavajay if your model deploy on cpu,you can try NCNN(https://github.com/Tencent/ncnn)。NCNN uses Winograd and 8-bit quantization for convolution computation。You are worth a try!

madhavajay commented 6 years ago

@zhy520xp okay great I will look at it, thank you! :)

madhavajay commented 6 years ago

@zhy520xp do you know if it supports SSD MobileNet architecture?

llhe commented 6 years ago

We will add Linaro toolchain soon which will enable official cross compiling for ARM Linux.

llhe commented 6 years ago

近期将会添加一个基于Linaro的默认ARM交叉编译器工具链,支持ARM Linux的交叉编译

tataganesh95 commented 6 years ago

@llhe Would the MACE documentation be updated with this? And can the build command be added to build-standalone-lib.sh? Thank you!

llhe commented 6 years ago

@tataganesh95 It will be integrated into the tools soon, and the documents will be updated accordingly.

Before that, you can simply build it with the following commands (not tested yet):

bazel build -s --config aarch64_linux --define openmp=true --define opencl=true --define neon=true //mace/libmace:libmace.so

Note: there is an issue with armeabi-v7a + NEON, which will be resolved soon.

madhavajay commented 6 years ago

@llhe you guys are awesome! Will test this soon. I assume that this will build on Ubuntu and just needs the correct arm toolchain installed? Is there more instructions for someone like me who isnt experienced with cross compiling?

madhavajay commented 6 years ago

Where do the neon headers come from are they part of the arm toolchain?

llhe commented 6 years ago

@madhavajay Yes, the toolchain contains libc/libc++/glibc/neon headers/libs. For the default Linaro toolchain, you don't need any installation steps, just run the previous bazel build -s --config aarch64_linux --define openmp=true --define opencl=true --define neon=true //mace/libmace:libmace.so command.

The default toolchain will be automatically downloaded here: https://github.com/XiaoMi/mace/blob/master/WORKSPACE#L150. You are free to change to your customized toolchain.

tataganesh95 commented 6 years ago

@llhe Sorry for extending this discussion, but I am not very experienced with cross-compilation, and a little confused regarding the model deployment process. Here are the steps I followed for cross compilation - Note- I used the mace lite edition docker image

  1. Changed build-standalone-lib.sh to cross-compile for aarch64. This creates the shared and static libraries.
  2. Convert the model ( In this case, the mobilenetV2 model ) using python tools/converter.py convert --config=/path/to/mace-models/mobilenet-v2/mobilenet-v2.yml with the configuration file present in mace-models. This generates the .pb and .data files.
  3. Now, I am assuming here, that running python tools/converter.py run --config=/path/to/mace-models/mobilenet-v2/mobilenet-v2.yml --example generates an executable in bazel-bin, and this executable, along with the static and shared libraries are to be deployed on the target machine? ( Target OS - Linux, Target architecture - aarch64 ). Thank you!
llhe commented 6 years ago

@tataganesh95 Do you need Android aarch64 or Linux aarch64? They are different ABIs and need different toolchains. Currently Android build is well supported and you can follow the steps in the documents.

This issue addresses the Linux aarch64 build which is not fully supported yet (you can use the previously mentioned bazel command to make the compilation). But the python tools wrapper is not well supported (e.g. tools/converter.py may not work) now (for example, it assumes adb to connect the device which is not true for Linux aarch64 boards), and we will be working on these tasks.

llhe commented 6 years ago

@tataganesh95 If you want to try with ARM Linux aarch64 before all the tools ready, you can build libmace.so by bazel build -s --config aarch64_linux --define openmp=true --define opencl=true --define neon=true //mace/libmace:libmace.so and make the model conversion using the current tools.

tataganesh95 commented 6 years ago

Do you need Android aarch64 or Linux aarch64?

Linux aarch64 Cool! I will wait for Linux aarch64 to be fully supported. I was just trying to understand whether I am cross-compiling mace for linux aarch64 correctly.

if you want to try with ARM Linux aarch64 before all the tools ready, you can build libmace.so by bazel build -s --config aarch64_linux --define openmp=true --define opencl=true --define neon=true //mace/libmace:libmace.so and make the model conversion using the current tools.

I have done the same, ran the example script as well. But since my target os is linux, I am not sure how am I supposed to run example.cc. I could see that an executable named example_static is created when I run python tools/converter.py run --config=/path/to/mace-models/mobilenet-v2/mobilenet-v2.yml --example. When I ran this executable on my aarch64 device, I am able to pass command line parameters to it ( --input_node, --output_node, --device etc ), and run the executable as well ( I am still not getting the desired output for Mobilenet-V2, though.)

I just wanted to know, whether the steps I followed for cross-compilating mace , and running the example on the target device, are right, or have I missed something? ( Steps have been mentioned in my previous comment ),

Eagerly looking forward to tools that can facilitate cross compilation for Arm linux! Thank you once again for such a prompt response!

llhe commented 6 years ago

@tataganesh95 The result of python tools/converter.py run --config=/path/to/mace-models/mobilenet-v2/mobilenet-v2.yml --example is targeted for Anrdoid ABI now. It's undefined when you run the binary in normal ARM Linux system. You can check the difference here: https://wiki.linaro.org/WorkingGroups/ToolChain/FAQ#What_is_the_difference_between_arm-linux-androideabi_arm-linux-gnueabi_toolchain_linux_toolchain.3F

tataganesh95 commented 6 years ago

@llhe I wasn't aware of that. Thank you! I will wait till the tools for ARM linux are built.

madhavajay commented 6 years ago

@tataganesh95 I was able to skip the Android stuff just by installing adb which is really easy: https://github.com/XiaoMi/mace/issues/176

I didnt install the NDK, to get rid of the ADB error I installed adb from here:
https://askubuntu.com/questions/34702/how-do-i-set-up-android-adb

Then:

python tools/converter.py convert --config=/path/to/mace-models/mobilenet-v2/mobilenet-v2.yml
tataganesh95 commented 6 years ago

@madhavajay I was able to run the convert script and obtained .data and .pb ( For mobilenet ). But, I was trying to run example.cc through the same script. With few minor changes I was able to run that as well, and that in turn generated an exectuble example_static. I ran this on my target machine and it did ran successfully but generated the wrong output for a sample image ( I was testing the grace hopper image ), so I was just wondering whether I missed a step or am I doing something wrong here.

madhavajay commented 6 years ago

@tataganesh95 you have gotten further than me. I literally just got it compiled and ran the test code in the make file project. My task was to evaluate if it was possible, I havent had a chance to actually use it yet. Sorry. :) Please let me know if you solve the issue as I will likely have the same problems.

pathwayai commented 6 years ago

@llhe is there any update on when the Linux aarch 64 architecture might be ready? Do you have any provisional benchmark results for it?

madhavajay commented 6 years ago

I ran this:

$ bazel build --config arm_linux --define openmp=true --define opencl=true --define neon=true //mace/libmace:libmace.so --sandbox_debug

Got this issue:

arm-linux-gnueabihf-gcc: error: mace_version_script.lds: No such file or directory

Full output:

INFO: Analysed target //mace/libmace:libmace.so (0 packages loaded).
INFO: Found 1 target...
ERROR: /home/pathwayai/mace/mace/libmace/BUILD:47:1: Linking of rule '//mace/libmace:libmace.so' failed (Exit 1): linux-sandbox failed: error executing command
  (cd /home/pathwayai/.cache/bazel/_bazel_pathwayai/c7986ad00fc23123aa9184aa80c62d45/execroot/mace && \
  exec env - \
    PATH=/home/pathwayai/bin:/home/pathwayai/.local/bin:/home/pathwayai/caffe/build/install/bin:/home/pathwayai/bin:/home/pathwayai/.local/bin:/home/pathwayai/caffe/build/install/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/home/pathwayai/bin:/home/pathwayai/bin \
    PWD=/proc/self/cwd \
    TMPDIR=/tmp \
  /home/pathwayai/.cache/bazel/_bazel_pathwayai/c7986ad00fc23123aa9184aa80c62d45/execroot/mace/_bin/linux-sandbox -t 15 -w /home/pathwayai/.cache/bazel/_bazel_pathwayai/c7986ad00fc23123aa9184aa80c62d45/sandbox/linux-sandbox/1/execroot/mace -w /tmp -w /dev/shm -D -- tools/arm_compiler/linaro_linux_gcc/arm-linux-gnueabihf-gcc -shared -o bazel-out/armeabi-v7a-fastbuild/bin/mace/libmace/libmace.so -Wl,-soname,libmace.so -Wl,--version-script mace_version_script.lds -fopenmp '--sysroot=external/gcc_linaro_7_3_1_arm_linux_gnueabihf/arm-linux-gnueabihf/libc' '-fuse-ld=gold' -Wl,-no-as-needed -no-canonical-prefixes -v -Wl,-z,relro,-z,now '-Wl,--build-id=md5' '-Wl,--hash-style=gnu' -Wl,-S -Wl,@bazel-out/armeabi-v7a-fastbuild/bin/mace/libmace/libmace.so-2.params)
src/main/tools/linux-sandbox.cc:154: linux-sandbox-pid1 has PID 28549
src/main/tools/linux-sandbox-pid1.cc:175: working dir: /home/pathwayai/.cache/bazel/_bazel_pathwayai/c7986ad00fc23123aa9184aa80c62d45/sandbox/linux-sandbox/1/execroot/mace
src/main/tools/linux-sandbox-pid1.cc:194: writable: /home/pathwayai/.cache/bazel/_bazel_pathwayai/c7986ad00fc23123aa9184aa80c62d45/sandbox/linux-sandbox/1/execroot/mace
src/main/tools/linux-sandbox-pid1.cc:194: writable: /tmp
src/main/tools/linux-sandbox-pid1.cc:194: writable: /dev/shm
src/main/tools/linux-sandbox-pid1.cc:265: remount ro: /
src/main/tools/linux-sandbox-pid1.cc:265: remount ro: /dev
src/main/tools/linux-sandbox-pid1.cc:265: remount ro: /dev/pts
src/main/tools/linux-sandbox-pid1.cc:265: remount rw: /dev/shm
src/main/tools/linux-sandbox-pid1.cc:265: remount ro: /dev/hugepages
src/main/tools/linux-sandbox-pid1.cc:265: remount ro: /dev/mqueue
src/main/tools/linux-sandbox-pid1.cc:265: remount ro: /run
src/main/tools/linux-sandbox-pid1.cc:265: remount ro: /run/lock
src/main/tools/linux-sandbox-pid1.cc:265: remount ro: /run/user/1005
src/main/tools/linux-sandbox-pid1.cc:265: remount ro: /sys
src/main/tools/linux-sandbox-pid1.cc:265: remount ro: /sys/kernel/security
src/main/tools/linux-sandbox-pid1.cc:265: remount ro: /sys/fs/cgroup
src/main/tools/linux-sandbox-pid1.cc:265: remount ro: /sys/fs/cgroup/systemd
src/main/tools/linux-sandbox-pid1.cc:265: remount ro: /sys/fs/cgroup/pids
src/main/tools/linux-sandbox-pid1.cc:265: remount ro: /sys/fs/cgroup/rdma
src/main/tools/linux-sandbox-pid1.cc:265: remount ro: /sys/fs/cgroup/cpu,cpuacct
src/main/tools/linux-sandbox-pid1.cc:265: remount ro: /sys/fs/cgroup/blkio
src/main/tools/linux-sandbox-pid1.cc:265: remount ro: /sys/fs/cgroup/perf_event
src/main/tools/linux-sandbox-pid1.cc:265: remount ro: /sys/fs/cgroup/devices
src/main/tools/linux-sandbox-pid1.cc:265: remount ro: /sys/fs/cgroup/cpuset
src/main/tools/linux-sandbox-pid1.cc:265: remount ro: /sys/fs/cgroup/net_cls,net_prio
src/main/tools/linux-sandbox-pid1.cc:265: remount ro: /sys/fs/cgroup/hugetlb
src/main/tools/linux-sandbox-pid1.cc:265: remount ro: /sys/fs/cgroup/freezer
src/main/tools/linux-sandbox-pid1.cc:265: remount ro: /sys/fs/cgroup/memory
src/main/tools/linux-sandbox-pid1.cc:265: remount ro: /sys/fs/pstore
src/main/tools/linux-sandbox-pid1.cc:265: remount ro: /sys/kernel/debug
src/main/tools/linux-sandbox-pid1.cc:265: remount ro: /sys/kernel/debug/tracing
src/main/tools/linux-sandbox-pid1.cc:265: remount ro: /sys/fs/fuse/connections
src/main/tools/linux-sandbox-pid1.cc:265: remount ro: /sys/kernel/config
src/main/tools/linux-sandbox-pid1.cc:265: remount ro: /proc
src/main/tools/linux-sandbox-pid1.cc:265: remount ro: /proc/sys/fs/binfmt_misc
src/main/tools/linux-sandbox-pid1.cc:265: remount ro: /proc/sys/fs/binfmt_misc
src/main/tools/linux-sandbox-pid1.cc:265: remount ro: /var/lib/lxcfs
src/main/tools/linux-sandbox-pid1.cc:265: remount rw: /home/pathwayai/.cache/bazel/_bazel_pathwayai/c7986ad00fc23123aa9184aa80c62d45/sandbox/linux-sandbox/1/execroot/mace
src/main/tools/linux-sandbox-pid1.cc:265: remount rw: /home/pathwayai/.cache/bazel/_bazel_pathwayai/c7986ad00fc23123aa9184aa80c62d45/sandbox/linux-sandbox/1/execroot/mace
src/main/tools/linux-sandbox-pid1.cc:265: remount rw: /tmp
src/main/tools/linux-sandbox-pid1.cc:265: remount rw: /dev/shm
src/main/tools/process-tools.cc:118: sigaction(32, &sa, nullptr) failed
src/main/tools/process-tools.cc:118: sigaction(33, &sa, nullptr) failed
arm-linux-gnueabihf-gcc: error: mace_version_script.lds: No such file or directory
Using built-in specs.
COLLECT_GCC=external/gcc_linaro_7_3_1_arm_linux_gnueabihf/bin/arm-linux-gnueabihf-gcc
COLLECT_LTO_WRAPPER=external/gcc_linaro_7_3_1_arm_linux_gnueabihf/bin/../libexec/gcc/arm-linux-gnueabihf/7.3.1/lto-wrapper
Target: arm-linux-gnueabihf
Configured with: '/home/tcwg-buildslave/workspace/tcwg-make-release/builder_arch/amd64/label/tcwg-x86_64-build/target/arm-linux-gnueabihf/snapshots/gcc.git~linaro-7.3-2018.05/configure' SHELL=/bin/bash --with-mpc=/home/tcwg-buildslave/workspace/tcwg-make-release/builder_arch/amd64/label/tcwg-x86_64-build/target/arm-linux-gnueabihf/_build/builds/destdir/x86_64-unknown-linux-gnu --with-mpfr=/home/tcwg-buildslave/workspace/tcwg-make-release/builder_arch/amd64/label/tcwg-x86_64-build/target/arm-linux-gnueabihf/_build/builds/destdir/x86_64-unknown-linux-gnu --with-gmp=/home/tcwg-buildslave/workspace/tcwg-make-release/builder_arch/amd64/label/tcwg-x86_64-build/target/arm-linux-gnueabihf/_build/builds/destdir/x86_64-unknown-linux-gnu --with-gnu-as --with-gnu-ld --disable-libmudflap --enable-lto --enable-shared --without-included-gettext --enable-nls --with-system-zlib --disable-sjlj-exceptions --enable-gnu-unique-object --enable-linker-build-id --disable-libstdcxx-pch --enable-c99 --enable-clocale=gnu --enable-libstdcxx-debug --enable-long-long --with-cloog=no --with-ppl=no --with-isl=no --disable-multilib --with-float=hard --with-fpu=vfpv3-d16 --with-mode=thumb --with-tune=cortex-a9 --with-arch=armv7-a --enable-threads=posix --enable-multiarch --enable-libstdcxx-time=yes --enable-gnu-indirect-function --with-build-sysroot=/home/tcwg-buildslave/workspace/tcwg-make-release/builder_arch/amd64/label/tcwg-x86_64-build/target/arm-linux-gnueabihf/_build/sysroots/arm-linux-gnueabihf --with-sysroot=/home/tcwg-buildslave/workspace/tcwg-make-release/builder_arch/amd64/label/tcwg-x86_64-build/target/arm-linux-gnueabihf/_build/builds/destdir/x86_64-unknown-linux-gnu/arm-linux-gnueabihf/libc --enable-checking=release --disable-bootstrap --enable-languages=c,c++,fortran,lto --build=x86_64-unknown-linux-gnu --host=x86_64-unknown-linux-gnu --target=arm-linux-gnueabihf --prefix=/home/tcwg-buildslave/workspace/tcwg-make-release/builder_arch/amd64/label/tcwg-x86_64-build/target/arm-linux-gnueabihf/_build/builds/destdir/x86_64-unknown-linux-gnu
Thread model: posix
gcc version 7.3.1 20180425 [linaro-7.3-2018.05 revision d29120a424ecfbc167ef90065c0eeb7f91977701] (Linaro GCC 7.3-2018.05)
src/main/tools/linux-sandbox-pid1.cc:437: waitpid returned 2
src/main/tools/linux-sandbox-pid1.cc:457: child exited with code 1
src/main/tools/linux-sandbox.cc:204: child exited normally with exitcode 1
Target //mace/libmace:libmace.so failed to build
INFO: Elapsed time: 28.551s, Critical Path: 0.29s
INFO: 0 processes.
FAILED: Build did NOT complete successfully
madhavajay commented 6 years ago

@llhe any idea why i cant build the same cross compile libmace.so that the gitlab ci file says it builds?

pathwayai commented 6 years ago

Anyone know how to solve the above? I'm also trying to solve this.

llhe commented 6 years ago

@pathwayai Do you have the same problem?

llhe commented 6 years ago

@madhavajay Can you reproduce this error by executing this in the shell?

cd /home/pathwayai/.cache/bazel/_bazel_pathwayai/c7986ad00fc23123aa9184aa80c62d45/execroot/mace

/home/pathwayai/.cache/bazel/_bazel_pathwayai/c7986ad00fc23123aa9184aa80c62d45/execroot/mace/_bin/linux-sandbox -t 15 -w /home/pathwayai/.cache/bazel/_bazel_pathwayai/c7986ad00fc23123aa9184aa80c62d45/sandbox/linux-sandbox/1/execroot/mace -w /tmp -w /dev/shm -D -- tools/arm_compiler/linaro_linux_gcc/arm-linux-gnueabihf-gcc -shared -o bazel-out/armeabi-v7a-fastbuild/bin/mace/libmace/libmace.so -Wl,-soname,libmace.so -Wl,--version-script mace_version_script.lds -fopenmp '--sysroot=external/gcc_linaro_7_3_1_arm_linux_gnueabihf/arm-linux-gnueabihf/libc' '-fuse-ld=gold' -Wl,-no-as-needed -no-canonical-prefixes -v -Wl,-z,relro,-z,now '-Wl,--build-id=md5' '-Wl,--hash-style=gnu' -Wl,-S -Wl,@bazel-out/armeabi-v7a-fastbuild/bin/mace/libmace/libmace.so-2.params
Alnlll commented 5 years ago

Depend on

Before that, you can simply build it with the following commands (not tested yet): bazel build -s --config aarch64_linux --define openmp=true --define opencl=true --define neon=true //mace/libmace:libmace.so

I modified the "tools/build-standalone-lib.sh" to get aarch64 libraries:

echo "build shared lib for aarch64 + cpu"
bazel build -s --config optimization --config aarch64_linux --define openmp=true --define opencl=false --define neon=true //mace/libmace:libmace_dynamic
# bazel build --config android --config optimization mace/libmace:libmace_dynamic --define neon=true --define openmp=true --define opencl=true --define quantize=true --cpu=arm64-v8a
cp bazel-bin/mace/libmace/libmace.so $LIB_DIR/aarch64/cpu/

if [[ "$OSTYPE" != "darwin"* ]];then
    echo "build shared lib for linux-x86-64"
    bazel build mace/libmace:libmace_dynamic --config optimization --define quantize=true --define openmp=true
    cp bazel-bin/mace/libmace/libmace.so $LIB_DIR/linux-x86-64/
fi

then got a benchmark result on mobilenet-v1:

---------------------------------------------------------------------
                               Warm Up
----------------------------------------------------------------------
| round | first(ms) | curr(ms) | min(ms) | max(ms) | avg(ms) |   std |
----------------------------------------------------------------------
|     1 |   598.455 |  598.455 | 598.455 | 598.455 | 598.455 | 0.000 |
----------------------------------------------------------------------

-----------------------------------------------------------------------
                         Run without statistics
------------------------------------------------------------------------
| round | first(ms) | curr(ms) | min(ms) | max(ms) | avg(ms) |     std |
------------------------------------------------------------------------
|    18 |   566.921 |  566.962 | 566.583 | 568.177 | 566.871 | 315.777 |
------------------------------------------------------------------------

-----------------------------------------------------------------------
                          Run with statistics
------------------------------------------------------------------------
| round | first(ms) | curr(ms) | min(ms) | max(ms) | avg(ms) |     std |
------------------------------------------------------------------------
|    18 |   568.146 |  566.793 | 566.776 | 570.255 | 567.344 | 851.766 |
------------------------------------------------------------------------

but comparing to the benchmark result on the same model when using the latest release version mace without building libs for aarch64, it is a bad latency performance degradation:

---------------------------------------------------------------------
                               Warm Up
----------------------------------------------------------------------
| round | first(ms) | curr(ms) | min(ms) | max(ms) | avg(ms) |   std |
----------------------------------------------------------------------
|     1 |   229.561 |  229.561 | 229.561 | 229.561 | 229.561 | 0.000 |
----------------------------------------------------------------------

-------------------------------------------------------------------------
                          Run without statistics
--------------------------------------------------------------------------
| round | first(ms) | curr(ms) | min(ms) | max(ms) | avg(ms) |       std |
--------------------------------------------------------------------------
|    48 |   216.909 |  205.191 | 204.749 | 424.953 | 210.717 | 31344.958 |
--------------------------------------------------------------------------

------------------------------------------------------------------------
                          Run with statistics
-------------------------------------------------------------------------
| round | first(ms) | curr(ms) | min(ms) | max(ms) | avg(ms) |      std |
-------------------------------------------------------------------------
|    49 |   212.687 |  205.001 | 204.994 | 212.687 | 205.952 | 1464.004 |
-------------------------------------------------------------------------

Question

llhe commented 5 years ago

@Alnlll You can try with official support. If there is a abnormal performance degradation, please file a new issue.

llhe commented 5 years ago

Close this which is official supported.

ysyyork commented 5 years ago

any one already got some benchmark on RK33XX series? we are using RK3399 and we'd love to leverage the Mali 860 on the board. But it seems a hard task. Just wanna get some sense about if the GPU can actually out perform the CPU cus I noticed in the above conversation it seems GPU is not always faster than CPU on mobile platform. BTW, this is really an awesome project! Thanks guys!

nolanliou commented 5 years ago

@ysyyork we have tested the mobilenet-v1 on RK3399 with GPU, Only Buffer-based OpenCL implementation could outperform the CPU, but only a small amount of Ops support Buffer-based OpenCL now. The detailed usage please refer to document.

ysyyork commented 5 years ago

@nolanliou thanks so much! This is very helpful