ARM-software / ComputeLibrary

The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.
MIT License
2.76k stars 767 forks source link

Example to use the bfloat16 kernel directly. #1015

Closed sunnykriplani closed 1 year ago

sunnykriplani commented 1 year ago

Output of 'strings libarm_compute.so | grep arm_compute_version':

*Platform:bare-metal in C*

*Operating System:Windows*

I am trying to use the a64_hybrid_bf16fp32_dot_6*16 kernel to validate some bfloat16 feature and wanted to use the kernel code directly to perform the GeMM multiplication.

I am unable to find an example to use a kernel directly and neither understanding the input arguments directly for the function.

Is there an example on how to use the kernel function directly and also if someone can explain the input arguments of the kernelt mentioned above.

thanks

morgolock commented 1 year ago

Hi @sunnykriplani

The recommended way of executing a64_hybrid_bf16fp32_dot_6x16 is by using CpuGemm see https://github.com/ARM-software/ComputeLibrary/blob/main/src/cpu/operators/CpuGemm.h

At runtime and based on the cpu capabilities of the device, ACL will choose the best kernel from src/core/NEON/kernels/arm_gemm/kernels.

a64_hybrid_bf16fp32_dot_6x16 will be executed when you use CpuGemm and build for the target armv8.6.

ACL does not provide a public interface arm_gemm kernels, however you can find out more on how to call these kernels directly by looking into CpuGemmAssemblyDispatch.h and CpuGemm, see _asm_glue in https://github.com/ARM-software/ComputeLibrary/blob/main/src/cpu/operators/CpuGemm.cpp#L82

Hope this helps