Is it possible to configure the rounding policy when do low precision gemm

ARM-software / ComputeLibrary

The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.

MIT License

2.76k stars 767 forks source link

Is it possible to configure the rounding policy when do low precision gemm #1010

Closed chenbohua3 closed 1 year ago

chenbohua3 commented 1 year ago

After some efforts, I failed to found the place to set rounding policy for CpuGemmLowpMatrixMultiplyCore (which can be used during the process of converting s32 to s8). So, is it possible to configure the rounding policy?

By the way, this feature is very necessary. For example, the round policy of PyTorch is round_to_even, but the default policy of ACL is round_to_up. This will lead to a accuracy gap between the one you obtained during the quantization-aware training in PyTorch, and the one obtained by the corresponding real quantized acl model

GGGGxxxxxxxxr commented 1 year ago

From S32 to S8, the process has conducted via CpuGemmLowpOutputStage.cpp, which would call kernels::CpuGemmLowpQuantizedDownInt32toInt8ByFixedPointKernel. You could just change the rounding policy there. I have modified the rounding policy of the saturating stage so I am pretty sure it could be done.

chenbohua3 commented 1 year ago

@GGGGxxxxxxxxr Thanks a lot:) By the way, is there any API to configure it instead of modifying the codes of ACL? In my usage scenario, I have an inference framework, which will call ACL to implement the inference of the calculation-intensive operator. If it is possible to configure it through API, there is no need for me to maintain a "rounding policy modified" version of ACL. I just compile the ACL from source and use it directly.

morgolock commented 1 year ago

Hi @chenbohua3

There is no way to configure this in ACL, you have to make changes in:

Hope this helps.