The speed of quantized tflite model is slower than its float model.

liyancas commented 5 years ago

I benchmarked the models on the OnePlus 3T platform, The performance of tflite quantized models are worse than the float modes. Has anyone run into the same issue?

The command I used:

python tools/benchmark.py --output_dir=output --frameworks=all \ --runtimes=all --model_names=all \ --target_abis=arm64-v8a

model_name	device_name	soc	abi	runtime	MACE	SNPE	NCNN	TFLITE
InceptionV3	ONEPLUS A3010	msm8996	arm64-v8a	CPU	886.312	664.404	1578.295	997
InceptionV3	ONEPLUS A3010	msm8996	arm64-v8a	DSP		4.996
InceptionV3	ONEPLUS A3010	msm8996	arm64-v8a	GPU	153.049	141.246
InceptionV3Quant	ONEPLUS A3010	msm8996	arm64-v8a	CPU				1014.75
MobileNetV1	ONEPLUS A3010	msm8996	arm64-v8a	CPU	52.046	385.444	37.883	71.367
MobileNetV1	ONEPLUS A3010	msm8996	arm64-v8a	GPU	25.267	24.441
MobileNetV1Quant	ONEPLUS A3010	msm8996	arm64-v8a	CPU	36.743			145.778
MobileNetV2	ONEPLUS A3010	msm8996	arm64-v8a	CPU	40.625	413.553	29.208	76.021
MobileNetV2	ONEPLUS A3010	msm8996	arm64-v8a	GPU	17.546	14.966
MobileNetV2Quant	ONEPLUS A3010	msm8996	arm64-v8a	CPU	28.679			294.099
SqueezeNetV11	ONEPLUS A3010	msm8996	arm64-v8a	CPU	37.453	59.481	21.376
SqueezeNetV11	ONEPLUS A3010	msm8996	arm64-v8a	GPU	20.001	17.986
VGG16	ONEPLUS A3010	msm8996	arm64-v8a	CPU	452.521	1002.442	821.195
VGG16	ONEPLUS A3010	msm8996	arm64-v8a	DSP		136.465
VGG16	ONEPLUS A3010	msm8996	arm64-v8a	GPU	196.507

liyancas commented 5 years ago

Another question: For Inception V3, the performance of DSP is ~132x faster than CPU. Is it normal?

llhe commented 5 years ago

DSP number looks like problematic, SNPE may has some errors for Snapdragon 821. @lee-bin @lydoc Can you have a look?

lee-bin commented 5 years ago

Another question: For Inception V3, the performance of DSP is ~132x faster than CPU. Is it normal?

I tested it on msm8996, and get results below. You can check the benchmark log, maybe it does not exit normally?

model_name	device_name	soc	abi	runtime	MACE	SNPE	NCNN	TFLITE
InceptionV3	MI 5s	msm8996	armeabi-v7a	DSP		67.856
VGG16	MI 5s	msm8996	armeabi-v7a	DSP		141.415

lydoc commented 5 years ago

I benchmarked the models on the OnePlus 3T platform, The performance of tflite quantized models are worse than the float modes. Has anyone run into the same issue?

The command I used:

python tools/benchmark.py --output_dir=output --frameworks=all \ --runtimes=all --model_names=all \ --target_abis=arm64-v8a

model_name device_name soc abi runtime MACE SNPE NCNN TFLITE InceptionV3 ONEPLUS A3010 msm8996 arm64-v8a CPU 886.312 664.404 1578.295 997 InceptionV3 ONEPLUS A3010 msm8996 arm64-v8a DSP 4.996
InceptionV3 ONEPLUS A3010 msm8996 arm64-v8a GPU 153.049 141.246
InceptionV3Quant ONEPLUS A3010 msm8996 arm64-v8a CPU 1014.75 MobileNetV1 ONEPLUS A3010 msm8996 arm64-v8a CPU 52.046 385.444 37.883 71.367 MobileNetV1 ONEPLUS A3010 msm8996 arm64-v8a GPU 25.267 24.441
MobileNetV1Quant ONEPLUS A3010 msm8996 arm64-v8a CPU 36.743 145.778 MobileNetV2 ONEPLUS A3010 msm8996 arm64-v8a CPU 40.625 413.553 29.208 76.021 MobileNetV2 ONEPLUS A3010 msm8996 arm64-v8a GPU 17.546 14.966
MobileNetV2Quant ONEPLUS A3010 msm8996 arm64-v8a CPU 28.679 294.099 SqueezeNetV11 ONEPLUS A3010 msm8996 arm64-v8a CPU 37.453 59.481 21.376
SqueezeNetV11 ONEPLUS A3010 msm8996 arm64-v8a GPU 20.001 17.986
VGG16 ONEPLUS A3010 msm8996 arm64-v8a CPU 452.521 1002.442 821.195
VGG16 ONEPLUS A3010 msm8996 arm64-v8a DSP 136.465
VGG16 ONEPLUS A3010 msm8996 arm64-v8a GPU 196.507

This issue was caused by the num_threads argument. We use available big cores to benchmark by taskset command for TFLITE benchmark and the default number of threads is 4. For msm8996, it has 2 big cores and 2 little, so you can use :

python tools/benchmark.py --output_dir=output --frameworks=all \
--runtimes=all --model_names=all \
--target_abis=arm64-v8a \
--num_threads=2

Besides, for some SoCs such as msm8996, it may be faster by using all of the CPUs instead of binding big cores CPUs. You can change the relevant code: https://github.com/XiaoMi/mobile-ai-bench/blob/ff6667dafe6189b04724e583913d968322ca7c0e/tools/sh_commands.py#L394

liyancas commented 5 years ago

@lee-bin Does XiaoMI 5S support both OpenCL GPU and DSP? I have tried Pix phone with msm8996, and got a runtime exception. The same issue can be found at https://developer.qualcomm.com/forum/qdn-forums/software/snapdragon-neural-processing-engine-sdk/34526

liyancas commented 5 years ago

@lydoc I will try again later. Many thanks.

lee-bin commented 5 years ago

@liyancas Yes. MI 5S support both OpenCL GPU and DSP, SNPE works fine on CPU/GPU/DSP of MI 5S. So it seems like a problem of Pixel phone.

llhe commented 5 years ago

Google does not support OpenCL in their devices (unofficial source says that it's due to OpenCL trademark is hold by Apple).

XiaoMi / mobile-ai-bench

The speed of quantized tflite model is slower than its float model. #20