Closed SamanthaWangdl closed 1 month ago
Hi, I have a question regarding optimization strategies: is the difference between scalar
and auto_vectorization
due to llc being enabled with O0 and O3, or due to their different lowering passes?
If the intention is to compare the effect of llc with O0 and O3, you could name them explictly(e.g. scalar-llc-O0
and scalar-llc
). If it is about comparing lowering passes, it seems to me that the passes for auto-vectorization do not include vectorization, since it uses -convert-linalg-to-loops
, which converts loops directly into a scalar version.
To generate a auto-vectorization version, you can try integrating the existing buddy-mlir pass batchmatmul-optimize
. Maybe you can try this lowering path:
--linalg-bufferize
--batchmatmul-optimize
--convert-linalg-to-loops
--func-bufferize
--arith-bufferize
--tensor-bufferize
--finalizing-bufferize
--lower-affine
--convert-scf-to-cf
--expand-strided-metadata
--convert-vector-to-llvm
--memref-expand
--arith-expand
--convert-arith-to-llvm
--finalize-memref-to-llvm
--convert-math-to-llvm
--llvm-request-c-wrappers
--convert-func-to-llvm
--reconcile-unrealized-casts
The other code parts look good to me, thanks!
Add Batch Matmul Benchmark with CMake Integration to DeepLearning Ops
Changes