buddy-compiler / buddy-benchmark

Benchmark Framework for Buddy Projects
Apache License 2.0
45 stars 36 forks source link

[DeepLearning/Ops] Add Batch Matmul Benchmark with CMake Integration to DeepLearning Ops #133

Closed SamanthaWangdl closed 1 month ago

SamanthaWangdl commented 1 month ago

Add Batch Matmul Benchmark with CMake Integration to DeepLearning Ops

Changes

xlinsist commented 1 month ago

Hi, I have a question regarding optimization strategies: is the difference between scalar and auto_vectorization due to llc being enabled with O0 and O3, or due to their different lowering passes?

If the intention is to compare the effect of llc with O0 and O3, you could name them explictly(e.g. scalar-llc-O0 and scalar-llc). If it is about comparing lowering passes, it seems to me that the passes for auto-vectorization do not include vectorization, since it uses -convert-linalg-to-loops, which converts loops directly into a scalar version.

To generate a auto-vectorization version, you can try integrating the existing buddy-mlir pass batchmatmul-optimize. Maybe you can try this lowering path:

      --linalg-bufferize
      --batchmatmul-optimize
      --convert-linalg-to-loops
      --func-bufferize
      --arith-bufferize
      --tensor-bufferize
      --finalizing-bufferize
      --lower-affine
      --convert-scf-to-cf
      --expand-strided-metadata
      --convert-vector-to-llvm
      --memref-expand
      --arith-expand
      --convert-arith-to-llvm
      --finalize-memref-to-llvm
      --convert-math-to-llvm
      --llvm-request-c-wrappers
      --convert-func-to-llvm
      --reconcile-unrealized-casts

The other code parts look good to me, thanks!