Closed daoxian closed 2 years ago
Hi @daoxian
ACL will detect the hardware capabilities (number of cpu cores and cpu features) and choose the best kernel possible for that particular configuration. All this is done when calling to configure()
as in https://github.com/ARM-software/ComputeLibrary/blob/main/examples/neon_gemm_qasymm8.cpp#L220
During configuration() the operator instantiates the correct assembly kernel: https://github.com/ARM-software/ComputeLibrary/blob/main/src/cpu/operators/CpuGemmLowpMatrixMultiplyCore.cpp#L128
Another example you can look at is the fixture in the validation suite: https://github.com/ARM-software/ComputeLibrary/blob/main/tests/validation/fixtures/GEMMLowpFixture.h#L129
Hope this helps
Output of 'strings libarm_compute.so | grep arm_compute_version': arm_compute_version=v22.02 Build options: {'Werror': '1', 'debug': '1', 'neon': '1', 'opencl': '0', 'os': 'linux', 'arch': 'arm64-v8.2-a-sve'} Git hash=unknown arm_compute_version.embed
Platform:
AArch64, armv9 ,sve supported
Operating System: Ubuntu 20.04.3 LTS
Problem description: Compiled "examples/neon_gemm_qasymm8.cpp" for test.
cd build ./neon_gemm_qasymm8 16 16 12 the result was correct. But the gemm kernel used C++ code instead of asm optimised code.
Then after I've modified the codes:
and "src/cpu/kernels/CpuGemmLowpOffsetContributionOutputStageKernel.cpp" validate_arguments():
ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(output, 1, DataType::QASYMM8, DataType::QASYMM8_SIGNED, DataType::S32);
to force the kernel make use of asm optimised algorithms, the result isn't correct!
Could anyone give me some hints on how to use optimised algo of int8 gemm ? Any tips are appreciated!