ARM-software / ComputeLibrary

The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.
MIT License
2.76k stars 767 forks source link

Asm int8 gemm result isn't correct #985

Closed daoxian closed 1 year ago

daoxian commented 2 years ago

Output of 'strings libarm_compute.so | grep arm_compute_version': arm_compute_version=v22.02 Build options: {'Werror': '1', 'debug': '1', 'neon': '1', 'opencl': '0', 'os': 'linux', 'arch': 'arm64-v8.2-a-sve'} Git hash=unknown arm_compute_version.embed

Platform:
AArch64, armv9 ,sve supported

Operating System: Ubuntu 20.04.3 LTS

Problem description: Compiled "examples/neon_gemm_qasymm8.cpp" for test.
cd build ./neon_gemm_qasymm8 16 16 12 the result was correct. But the gemm kernel used C++ code instead of asm optimised code.

Then after I've modified the codes: image

and "src/cpu/kernels/CpuGemmLowpOffsetContributionOutputStageKernel.cpp" validate_arguments(): ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(output, 1, DataType::QASYMM8, DataType::QASYMM8_SIGNED, DataType::S32);

to force the kernel make use of asm optimised algorithms, the result isn't correct! image

Could anyone give me some hints on how to use optimised algo of int8 gemm ? Any tips are appreciated!

morgolock commented 1 year ago

Hi @daoxian

ACL will detect the hardware capabilities (number of cpu cores and cpu features) and choose the best kernel possible for that particular configuration. All this is done when calling to configure() as in https://github.com/ARM-software/ComputeLibrary/blob/main/examples/neon_gemm_qasymm8.cpp#L220

During configuration() the operator instantiates the correct assembly kernel: https://github.com/ARM-software/ComputeLibrary/blob/main/src/cpu/operators/CpuGemmLowpMatrixMultiplyCore.cpp#L128

Another example you can look at is the fixture in the validation suite: https://github.com/ARM-software/ComputeLibrary/blob/main/tests/validation/fixtures/GEMMLowpFixture.h#L129

Hope this helps