ARM-software / ComputeLibrary

The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.
MIT License
2.76k stars 767 forks source link

uint8 quantized model runs slower than fp32 model #987

Closed liamsun2019 closed 1 year ago

liamsun2019 commented 2 years ago

Hi author, I encountered a question while doing inference on cortex-A55 aarch64 with CpuAcc as the backend. There are 2 models , one is fp32 and the other one is uint8 quantized. My tests showed that fp32 ran even faster than the uint8 quantized one. Just curious why this would happen. Please refer to the attachment for the 2 models. In addition, both c++ parser mode and delegate mode have the same issue. ReduceFp32ToFp16 is set to True in my tests. Appreciate your suggestions. Thanks. test.zip

morgolock commented 1 year ago

Closing this as a duplicate of https://github.com/ARM-software/armnn/issues/667