Example to use the bfloat16 kernel directly.

ARM-software / ComputeLibrary

The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.

2.87k stars 782 forks source link

Hi @sunnykriplani

The recommended way of executing a64_hybrid_bf16fp32_dot_6x16 is by using CpuGemm see https://github.com/ARM-software/ComputeLibrary/blob/main/src/cpu/operators/CpuGemm.h

At runtime and based on the cpu capabilities of the device, ACL will choose the best kernel from src/core/NEON/kernels/arm_gemm/kernels.

a64_hybrid_bf16fp32_dot_6x16 will be executed when you use CpuGemm and build for the target armv8.6.

ACL does not provide a public interface arm_gemm kernels, however you can find out more on how to call these kernels directly by looking into CpuGemmAssemblyDispatch.h and CpuGemm, see _asm_glue in https://github.com/ARM-software/ComputeLibrary/blob/main/src/cpu/operators/CpuGemm.cpp#L82

Hope this helps

ARM-software / ComputeLibrary