ACL-lowp includes NEON-based kernel for low-bitwidth quantized matrix multiplication.
It is now integrated to arm compute library.
There is more room to optimize kernels for specific devices, such as tile size, loop unrolling factor, etc.
However, tuning the settings of kernels by hands is very inefficient.
ANT DL compiler provides automatic kernel tuning functionality.
So, integrating these kernels into ANT DL compiler may produce better performance.
ACL-lowp includes NEON-based kernel for low-bitwidth quantized matrix multiplication. It is now integrated to arm compute library.
There is more room to optimize kernels for specific devices, such as tile size, loop unrolling factor, etc. However, tuning the settings of kernels by hands is very inefficient.
ANT DL compiler provides automatic kernel tuning functionality. So, integrating these kernels into ANT DL compiler may produce better performance.