Closed chenbohua3 closed 1 year ago
From S32 to S8, the process has conducted via CpuGemmLowpOutputStage.cpp, which would call kernels::CpuGemmLowpQuantizedDownInt32toInt8ByFixedPointKernel. You could just change the rounding policy there. I have modified the rounding policy of the saturating stage so I am pretty sure it could be done.
@GGGGxxxxxxxxr Thanks a lot:) By the way, is there any API to configure it instead of modifying the codes of ACL? In my usage scenario, I have an inference framework, which will call ACL to implement the inference of the calculation-intensive operator. If it is possible to configure it through API, there is no need for me to maintain a "rounding policy modified" version of ACL. I just compile the ACL from source and use it directly.
Hi @chenbohua3
There is no way to configure this in ACL, you have to make changes in:
Hope this helps.
After some efforts, I failed to found the place to set rounding policy for
CpuGemmLowpMatrixMultiplyCore
(which can be used during the process of converting s32 to s8). So, is it possible to configure the rounding policy?By the way, this feature is very necessary. For example, the round policy of PyTorch is
round_to_even
, but the default policy of ACL isround_to_up
. This will lead to a accuracy gap between the one you obtained during the quantization-aware training in PyTorch, and the one obtained by the corresponding real quantized acl model