Closed srikris-sridhar closed 8 months ago
Hi @srikris-sridhar
GPU Convolution for INT8 is slower than FP16
Just please clarify if you meant FP32 instead of FP16.
Do you mean int8 model being slower than fp32 in GpuAcc running on Mali GPU G68
?
I meant FP16, not FP32. Sorry for the typo in the numbers.
This is technically the following (since TFLite downcasts to FP16 for inference)
CPU (INT8): 2.5ms CPU (FP16): 6.3ms GPU (INT8): 7.2ms GPU (FP16): 8.4ms
Hi @srikris-sridhar
I can confirm that I reproduced the problem on Mali-G57, see below:
LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH ./ExecuteNetwork -c GpuAcc -m ./simple_conv_int8.tflite --iterations 12 | grep Inference
Info: Inference time: 41.74 ms
Info: Inference time: 30.60 ms
Info: Inference time: 31.26 ms
Info: Inference time: 30.85 ms
Info: Inference time: 29.64 ms
Info: Inference time: 30.25 ms
Info: Inference time: 30.24 ms
Info: Inference time: 30.35 ms
Info: Inference time: 30.75 ms
Info: Inference time: 31.25 ms
Info: Inference time: 30.72 ms
Info: Inference time: 30.00 ms
LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH ./ExecuteNetwork -c GpuAcc -m ./simple_conv_fp32.tflite --iterations 12 | grep Inference
Info: Inference time: 36.13 ms
Info: Inference time: 23.24 ms
Info: Inference time: 23.34 ms
Info: Inference time: 22.88 ms
Info: Inference time: 22.97 ms
Info: Inference time: 22.80 ms
Info: Inference time: 23.23 ms
Info: Inference time: 23.43 ms
Info: Inference time: 20.62 ms
Info: Inference time: 23.17 ms
Info: Inference time: 19.25 ms
Info: Inference time: 23.19 ms
Discussing this with the team it looks like there is some work to be done in the GEMM heuristics to choose the right block size.
Hope this helps.
Awesome. Thanks! Looking forward to seeing what can be done here. This seems consistent across many models.
Thanks for reporting this, we added the request to our backlog.
Closing the issue as we are already tracking it internally.
Based on an ARM-NN issue filed . @TeresaARM requested that I file a separate issue with the ARM Compute library.
Output of 'strings libarm_compute.so | grep arm_compute_version':
Platform: Samsung A33, Mali GPU G68 Operating System: Android 12
Problem description:
I've tried running a really simple convolution (attached 2 models, one in FP32 and one in INT8). Here is what I see on an Samsung A33, Mali GPU G68. I see a good boost with INT8 on CPU but not GPU. Is this expected?
models.zip