Closed sunnykriplani closed 1 year ago
Hi @sunnykriplani
The recommended way of executing a64_hybrid_bf16fp32_dot_6x16
is by using CpuGemm
see https://github.com/ARM-software/ComputeLibrary/blob/main/src/cpu/operators/CpuGemm.h
At runtime and based on the cpu capabilities of the device, ACL will choose the best kernel from src/core/NEON/kernels/arm_gemm/kernels
.
a64_hybrid_bf16fp32_dot_6x16 will be executed when you use CpuGemm
and build for the target armv8.6.
ACL does not provide a public interface arm_gemm
kernels, however you can find out more on how to call these kernels directly by looking into CpuGemmAssemblyDispatch.h and CpuGemm
, see _asm_glue
in https://github.com/ARM-software/ComputeLibrary/blob/main/src/cpu/operators/CpuGemm.cpp#L82
Hope this helps
Output of 'strings libarm_compute.so | grep arm_compute_version':
*Platform:bare-metal in C*
*Operating System:Windows*
I am trying to use the a64_hybrid_bf16fp32_dot_6*16 kernel to validate some bfloat16 feature and wanted to use the kernel code directly to perform the GeMM multiplication.
I am unable to find an example to use a kernel directly and neither understanding the input arguments directly for the function.
Is there an example on how to use the kernel function directly and also if someone can explain the input arguments of the kernelt mentioned above.
thanks