AI-performance / embedded-ai.bench

benchmark for embededded-ai deep learning inference engines, such as NCNN / TNN / MNN / TensorFlow Lite etc.
https://www.ai-performance.com
Other
202 stars 29 forks source link

mindspore lite端侧性能,初测情况统计 #35

Open ysh329 opened 3 years ago

ysh329 commented 3 years ago
ysh329 commented 3 years ago

835/armv8/CPU

tf_v1: 104 tf_v2: 79 caffe_v1: 115 caffe_v2: 108

835/armv8/GPU

tf_v1: 55 tf_v2: 36 caffe_v1: 56 caffe_v2: 46


855/armv8/cpu

tf_v1: 18 tf_v2: 13 caffe_v1: 18 caffe_v2: 16

855/armv8/gpu

tf_v1: 37 tf_v2: 25 caffe_v1: 36 caffe_v2: 28


980/armv8/CPU

tf_v1: 21ms tf_v2: 14ms caffe_v1: 21ms caffe_v2: 18ms

980/armv8/GPU

tf_v1: 31ms tf_v2: 29ms caffe_v1: 31ms caffe_v2: 30ms


990-cpu-armv8 tf_v1:36ms tf_v2:26ms caffe_v1:36ms caffe_v2:30ms

armv8-CPU 990 tf_v1:19ms tf_v2: 13ms caffe_v1: 20ms caffe_v2:17ms


armv8-GPU 845 tf_v1:44ms tf_v2: 30ms caffe_v1: 42ms caffe_v2:36ms

armv8-CPU 845 tf_v1:71ms tf_v2: 59ms caffe_v1: 85ms caffe_v2:80ms


SoC Model Avg(ms) Backend ThreadNum
990 caffe_mobilenetv1 0 CPU 1
990 caffe_mobilenetv2 0 CPU 1
990 tf_mobilenetv1 0 CPU 1
990 tf_mobilenetv2 0 CPU 1
990 caffe_mobilenetv1 0 GPU 1
990 caffe_mobilenetv2 0 GPU 1
990 tf_mobilenetv1 0 GPU 1
990 tf_mobilenetv2 0 GPU 1
980 caffe_mobilenetv1 0 CPU 1
980 caffe_mobilenetv2 0 CPU 1
980 tf_mobilenetv1 0 CPU 1
980 tf_mobilenetv2 0 CPU 1
980 caffe_mobilenetv1 0 GPU 1
980 caffe_mobilenetv2 0 GPU 1
980 tf_mobilenetv1 0 GPU 1
980 tf_mobilenetv2 0 GPU 1
855 caffe_mobilenetv1 0 CPU 1
855 caffe_mobilenetv2 0 CPU 1
855 tf_mobilenetv1 0 CPU 1
855 tf_mobilenetv2 0 CPU 1
855 caffe_mobilenetv1 0 GPU 1
855 caffe_mobilenetv2 0 GPU 1
855 tf_mobilenetv1 0 GPU 1
855 tf_mobilenetv2 0 GPU 1
845 caffe_mobilenetv1 0 CPU 1
845 caffe_mobilenetv2 0 CPU 1
845 tf_mobilenetv1 0 CPU 1
845 tf_mobilenetv2 0 CPU 1
845 caffe_mobilenetv1 0 GPU 1
845 caffe_mobilenetv2 0 GPU 1
845 tf_mobilenetv1 0 GPU 1
845 tf_mobilenetv2 0 GPU 1
835 caffe_mobilenetv1 0 CPU 1
835 caffe_mobilenetv2 0 CPU 1
835 tf_mobilenetv1 0 CPU 1
835 tf_mobilenetv2 0 CPU 1
835 caffe_mobilenetv1 0 GPU 1
835 caffe_mobilenetv2 0 GPU 1
835 tf_mobilenetv1 0 GPU 1
835 tf_mobilenetv2 0 GPU 1
ysh329 commented 3 years ago

2021年2月测试mindspore lite v1.1.0

mindspore lite的release页面不提供编译好的binary,因此需要从源码编译,后来在官网文档发现有编译好的。具体的依赖在官网有详细的说明和链接跳转:https://www.mindspore.cn/tutorial/lite/zh-CN/master/use/build.html。mindspore lite库还好说,但是converter依赖的东西就较多。之类简单罗列:

模型转换工具的编译安装

安装完上述依赖后,便可以开始编译converter,默认编译X86就有模型转换工具,而编译端侧库是没有的,执行下面命令开始编译x86_64架构Release版本,编译模型转换、基准测试和库裁剪工具:

bash build.sh -I x86_64

编译成功显示如下日志
[100%] Built target lite-test
Run CPack packaging tool...
CPack: Create package using TGZ
CPack: Install projects
CPack: - Run preinstall target for: Lite
CPack: - Install project: Lite []
CPack: -   Install component: inference-linux-x64
CPack: -   Install component: converter-linux-x64
CPack: Create package
CPack: - package: /home/xxxx/code/mindspore/output/tmp/mindspore-lite-1.1.0-converter-linux-x64.tar.gz generated.
CPack: - checksum file: /home/xxxx/code/mindspore/output/tmp/mindspore-lite-1.1.0-converter-linux-x64.tar.gz.sha256 generated.
CPack: - package: /home/xxxx/code/mindspore/output/tmp/mindspore-lite-1.1.0-inference-linux-x64.tar.gz generated.
CPack: - checksum file: /home/xxxx/code/mindspore/output/tmp/mindspore-lite-1.1.0-inference-linux-x64.tar.gz.sha256 generated.
---------------- mindspore lite: build success ----------------

但在运行converter时会提示如下报错:

./converter: error while loading shared libraries: libglog.so.0: cannot open shared object file: No such file or directory

将output目录下的mindspore-lite-1.1.0-converter-linux-x64.tar.gz解压,然后进入mindspore-lite-1.1.0-converter-linux-x64/converter,直接执行./converter会提示缺少shared lib,根据提示在上层目录的third-party和lib目录下把对应的.so全部拷贝到converter目录,然后执行export LD_LIBRARY_PATH=.,再执行./converter就能显现出命令使用帮助了。

下面将caffe和tflite格式的mobilenetv1和v2转换为mindspore lite模型格式:

# note: https://www.mindspore.cn/lite/tutorial/zh-CN/master/use/converter_tool.html

prefix_path="/home/yuanshuai/code/tmp/embedded-ai.bench/"
# caffe
./converter_lite --fmk=CAFFE --modelFile=${prefix_path}/models/caffe_mobilenetv1.prototxt --weightFile=${prefix_path}/models/caffe_mobilenetv1.caffemodel --outputFile=caffe_mobilenetv1
./converter_lite --fmk=CAFFE --modelFile=${prefix_path}/models/caffe_mobilenetv2.prototxt --weightFile=${prefix_path}/models/caffe_mobilenetv2.caffemodel --outputFile=caffe_mobilenetv2

# tflite
./converter_lite --fmk=TFLITE --modelFile=${prefix_path}/models/tf_mobilenetv1/tf_mobilenetv1.tflite --outputFile=tf_mobilenetv1
./converter_lite --fmk=TFLITE --modelFile=${prefix_path}/models/tf_mobilenetv2/tf_mobilenetv2.tflite --outputFile=tf_mobilenetv2

转换成功得到下面4个模型:

  1. caffe_mobilenetv1.ms
  2. caffe_mobilenetv2.ms
  3. tf_mobilenetv1.ms
  4. tf_mobilenetv2.ms

编译GPU库

默认编译GPU是带有CPU算子的,执行下面命令编译GPU mindspore lite库:

bash build.sh -I arm64 -e gpu
bash build.sh -I arm32 -e gpu

编译完成会在output目录下显示两个文件夹:

  1. mindspore-lite-1.1.0-inference-android-aarch32.tar.gz
  2. mindspore-lite-1.1.0-inference-android-aarch64.tar.gz

benchmark

解压上面两个压缩包后,里面有一个可执行文件bin,以及mindspore_lite.so文件,将其adb push到手机上,顺带前面转好的模型,执行跑一下benchmark:


$ADB mkdir -p /data/local/tmp/mindspore

$ADB push ./${prefix_path}/benchmark/benchmark /data/local/tmp/mindspore/benchmark
$ADB push ./${prefix_path}/lib/libmindspore-lite.so /data/local/tmp/mindspore/libmi
$ADB push ./${prefix_path}/lib/liboptimize.so /data/local/tmp/mindspore/liboptimize
$ADB push ./${prefix_path}/lib/libc++_shared.so /data/local/tmp/mindspore/libc++_sh

$ADB push ./tf_mobilenetv1.ms /data/local/tmp/mindspore/tf_mobilenetv1.ms
$ADB push ./tf_mobilenetv2.ms /data/local/tmp/mindspore/tf_mobilenetv2.ms
$ADB push ./caffe_mobilenetv1.ms /data/local/tmp/mindspore/caffe_mobilenetv1.ms
$ADB push ./caffe_mobilenetv2.ms /data/local/tmp/mindspore/caffe_mobilenetv2.ms

$ADB shell chmod +x /data/local/tmp/mindspore/benchmark
$ADB shell /data/local/tmp/mindspore/benchmark

# https://www.mindspore.cn/lite/tutorial/zh-CN/master/use/evaluating_the_model.html
#$ADB shell "export LD_LIBRARY_PATH=/data/local/tmp/mindspore/; \
#  /data/local/tmp/mindspore/benchmark \
#  --bleFp16mbleFp16odelPath=<xxx> \
#  --device=<CPU|GPU> \
#  --cpuBindMode=<-1:midcore, 1: bigcore, 0:nobind> \
#  --numThreads=<2> \
#  --loopCount=10 \
#  --warmUpLoopCount=3 \
#  --enableFp16=<false|true>"

$ADB shell "export LD_LIBRARY_PATH=/data/local/tmp/mindspore/; \
  /data/local/tmp/mindspore/benchmark \
  --modelFile=/data/local/tmp/mindspore/tf_mobilenetv1.ms \
  --device=GPU \
  --cpuBindMode=1 \
  --numThreads=1 \
  --loopCount=1000 \
  --warmUpLoopCount=20 \
  --enableFp16=true"