Work for the project in "Large Scale Data Engineering" at TU Berlin WS 23/ 24. The project is supervised by @philipportner.
The work mainly concerns an extension to the MatMulLoweringPass of Daphne.
Several command line options are introduced:
--mlir-codegen lowers to the naive structure with three loops.
--matmul-tile attempts to find tile sizes such that Register, L1, L2 and L3 caches are reused more efficiently.
--matmul-fixed-tile-sizes=4,4 can be used to specify up to 5 tile sizes to be used in a tiling scheme adapted from https://github.com/bondhugula/llvm-project/blob/hop/mlir/docs/HighPerfCodeGen.md.
matmul-unroll-factor=2 additionally unrolls the inner most loop in the tiling scheme by up to the specified factor.
--matmul-vec-size-bits=64 attempts to use vector instructions with the specified bitwidth, but at least one element. Is only executed, if the resulting vector size divides the matrix size.
Some automated test are added under test/api/cli/codegen. They do not all pass currently.
Work for the project in "Large Scale Data Engineering" at TU Berlin WS 23/ 24. The project is supervised by @philipportner.
The work mainly concerns an extension to the MatMulLoweringPass of Daphne.
Several command line options are introduced:
--mlir-codegen
lowers to the naive structure with three loops.--matmul-tile
attempts to find tile sizes such that Register, L1, L2 and L3 caches are reused more efficiently.--matmul-fixed-tile-sizes=4,4
can be used to specify up to 5 tile sizes to be used in a tiling scheme adapted from https://github.com/bondhugula/llvm-project/blob/hop/mlir/docs/HighPerfCodeGen.md.matmul-unroll-factor=2
additionally unrolls the inner most loop in the tiling scheme by up to the specified factor.--matmul-vec-size-bits=64
attempts to use vector instructions with the specified bitwidth, but at least one element. Is only executed, if the resulting vector size divides the matrix size.Some automated test are added under
test/api/cli/codegen
. They do not all pass currently.