Closed liyancas closed 5 years ago
Another question: For Inception V3, the performance of DSP is ~132x faster than CPU. Is it normal?
DSP number looks like problematic, SNPE may has some errors for Snapdragon 821. @lee-bin @lydoc Can you have a look?
Another question: For Inception V3, the performance of DSP is ~132x faster than CPU. Is it normal?
I tested it on msm8996, and get results below. You can check the benchmark log, maybe it does not exit normally?
model_name | device_name | soc | abi | runtime | MACE | SNPE | NCNN | TFLITE |
---|---|---|---|---|---|---|---|---|
InceptionV3 | MI 5s | msm8996 | armeabi-v7a | DSP | 67.856 | |||
VGG16 | MI 5s | msm8996 | armeabi-v7a | DSP | 141.415 |
I benchmarked the models on the OnePlus 3T platform, The performance of tflite quantized models are worse than the float modes. Has anyone run into the same issue?
The command I used:
python tools/benchmark.py --output_dir=output --frameworks=all \ --runtimes=all --model_names=all \ --target_abis=arm64-v8a
model_name device_name soc abi runtime MACE SNPE NCNN TFLITE InceptionV3 ONEPLUS A3010 msm8996 arm64-v8a CPU 886.312 664.404 1578.295 997 InceptionV3 ONEPLUS A3010 msm8996 arm64-v8a DSP 4.996
InceptionV3 ONEPLUS A3010 msm8996 arm64-v8a GPU 153.049 141.246
InceptionV3Quant ONEPLUS A3010 msm8996 arm64-v8a CPU 1014.75 MobileNetV1 ONEPLUS A3010 msm8996 arm64-v8a CPU 52.046 385.444 37.883 71.367 MobileNetV1 ONEPLUS A3010 msm8996 arm64-v8a GPU 25.267 24.441
MobileNetV1Quant ONEPLUS A3010 msm8996 arm64-v8a CPU 36.743 145.778 MobileNetV2 ONEPLUS A3010 msm8996 arm64-v8a CPU 40.625 413.553 29.208 76.021 MobileNetV2 ONEPLUS A3010 msm8996 arm64-v8a GPU 17.546 14.966
MobileNetV2Quant ONEPLUS A3010 msm8996 arm64-v8a CPU 28.679 294.099 SqueezeNetV11 ONEPLUS A3010 msm8996 arm64-v8a CPU 37.453 59.481 21.376
SqueezeNetV11 ONEPLUS A3010 msm8996 arm64-v8a GPU 20.001 17.986
VGG16 ONEPLUS A3010 msm8996 arm64-v8a CPU 452.521 1002.442 821.195
VGG16 ONEPLUS A3010 msm8996 arm64-v8a DSP 136.465
VGG16 ONEPLUS A3010 msm8996 arm64-v8a GPU 196.507
This issue was caused by the num_threads argument. We use available big cores to benchmark by taskset
command for TFLITE benchmark and the default number of threads is 4. For msm8996, it has 2 big cores and 2 little, so you can use :
python tools/benchmark.py --output_dir=output --frameworks=all \
--runtimes=all --model_names=all \
--target_abis=arm64-v8a \
--num_threads=2
Besides, for some SoCs such as msm8996, it may be faster by using all of the CPUs instead of binding big cores CPUs. You can change the relevant code: https://github.com/XiaoMi/mobile-ai-bench/blob/ff6667dafe6189b04724e583913d968322ca7c0e/tools/sh_commands.py#L394
@lee-bin Does XiaoMI 5S support both OpenCL GPU and DSP? I have tried Pix phone with msm8996, and got a runtime exception. The same issue can be found at https://developer.qualcomm.com/forum/qdn-forums/software/snapdragon-neural-processing-engine-sdk/34526
@lydoc I will try again later. Many thanks.
@liyancas Yes. MI 5S support both OpenCL GPU and DSP, SNPE works fine on CPU/GPU/DSP of MI 5S. So it seems like a problem of Pixel phone.
Google does not support OpenCL in their devices (unofficial source says that it's due to OpenCL trademark is hold by Apple).
I benchmarked the models on the OnePlus 3T platform, The performance of tflite quantized models are worse than the float modes. Has anyone run into the same issue?
The command I used:
python tools/benchmark.py --output_dir=output --frameworks=all \ --runtimes=all --model_names=all \ --target_abis=arm64-v8a