ARM-software / ComputeLibrary

The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.
MIT License
2.75k stars 767 forks source link

How to use fixed format kernel? #1070

Closed Waylon-Zhu closed 9 months ago

Waylon-Zhu commented 10 months ago

I have seen the use of fixed format kernel in the implementation of onednn, but I don’t understand how the weight tensor info is processed, such as the processing of stride. Is there any example code in C++?

Waylon-Zhu commented 10 months ago

In addition, I observed that the fixed format kernel is slower than the ordinary hybrid kernel. Is this true?

nSircombe commented 10 months ago

The weights are ordered into the memory format expected by the asm kernel in ACL ahead of being passed into ACL. In a 'non fixed format' build, the weights are re-ordered inside Compute Library to match the format that the chosen kernel expects. The "fixed format" nomenclature comes from the fact that this build is using a collection of GEMM kernels with a common (i.e. fixed) weights format.

This is potentially less performant than having an optimised format for each kernel, however, it allows the responsibility for getting the memory format expected by the kernel to be hoisted out of Compute Library and into oneDNN & TensorFlow. This is essential when dealing with cached oneDNN primitives (which TensorFlow uses) where the wei tensor of a cached primitive can get re-written. Without the fixed format kernels exposed in ACL and integrated into oneDNN, these re-written weights will not get ingested by the GEMM kernels which will use the original weights (re-ordered into the required memory format) leading to incorrect results. In this context, the ability to use primitive caching in TensorFlow, via oneDNN, outweighs the performance penalty of relying on these fixed format kernels.

https://github.com/oneapi-src/oneDNN/pull/1590

https://github.com/tensorflow/tensorflow/pull/57987