This PR majorly reworks codegen for AllAgg* and EwOps as well as add lowering for TransposeOp and Row/ColAgg*.
All of these passes are added to the optional MLIR codegen pipeline that can be enabled using the --mlir-codegen flag and offer alternative lowering of these operations to MLIR rather than calls to precompiled C++ kernels. Currently, they only support DenseMatrix with dimensions that are known at compile-time and any value type (except Booleans).
Except for IdxMin, IdxMax which are directly lowered to affine loops and TransposeOp which lowers to a named linalg op all passes make use of linalg GenericOps which are then lowered to affine loops in a later pass in the codegen pipeline.
They convert the input DenseMatrix to a MemRef and create a new MemRef for the output that is converted into a DenseMatrix.
Changes:
Add codegen for AllAgg*Op, Row/ColAgg*Op, Ew*Op and TransposeOp (see below for details)
Added passes to TableGen files and codegen pipeline
Added script level test cases / MLIR test cases (using FileCheck)
Replaced old tests
Renamed some old test scripts for EwOps for better organization
Edited fusion.mlir test to lower Linalg to affine loops before applying fusion pass
Added Canonicalization passes for floor, ceil, round that removes the respective ops when input type is an integer (this also simplifies codegen)
Added some necessary instantiations in kernels.json
Restored alphabetic sorting of codegen passes in ir/daphneir/Passes.h
The input is converted to a MemRef and a result MemRef is allocated. The first Linalg GenericOp initialized the result MemRef by copying the first row of the input and the second GenericOp iterates over the remaining values and applies the aggregation operation - an addition in this case.
Moving the LoopFusionPass below the LinalgToAffineLoopsPass enables some loop fusions already, but it seems to cause issues with e.g. TransposeOp. A simple example of this is X = [1,2,3](1,); print(t(X)); print(t(t(X)));. Hence, loop fusion has not been moved down yet.
Ew*Op broadcasting for singleton matrices currently has no canonicalizer pass to always move the singleton matrix to be the rhs operand. This should be handled separately though to take broadcasting for C++ kernels into account as well. (see #803)
Dimensions for codegen Ops currently need to be known at compile-time. This is due to the way MemRefType is currently handled during conversion of the input Dense Matrix to a MemRef.
RewriteToCallKernelOpPass currently fails if IR contains math.ipowi or any trigonometric math op other than sin and cos, e.g. no kernels registered for operation 'ipowi'. Hence, the ewBinaryPow test currently fails. Before merging this should be fixed or commented out. The same issue persists for the currently commented out lowering for trigonometric math ops tan, asin, acos, atan, sinh, cosh, tanh in EwOpsLowering.cpp.
This PR majorly reworks codegen for
AllAgg*
andEwOps
as well as add lowering forTransposeOp
andRow/ColAgg*
. All of these passes are added to the optional MLIR codegen pipeline that can be enabled using the--mlir-codegen
flag and offer alternative lowering of these operations toMLIR
rather than calls to precompiled C++ kernels. Currently, they only supportDenseMatrix
with dimensions that are known at compile-time and any value type (except Booleans).Except for
IdxMin
,IdxMax
which are directly lowered to affine loops andTransposeOp
which lowers to a named linalg op all passes make use of linalg GenericOps which are then lowered to affine loops in a later pass in the codegen pipeline. They convert the input DenseMatrix to aMemRef
and create a newMemRef
for the output that is converted into a DenseMatrix.Changes:
AllAgg*Op
,Row/ColAgg*Op
,Ew*Op
andTransposeOp
(see below for details)fusion.mlir
test to lower Linalg to affine loops before applying fusion passfloor
,ceil
,round
that removes the respective ops when input type is an integer (this also simplifies codegen)kernels.json
ir/daphneir/Passes.h
Ops with new codegen:
AllAgg*Op
Sum
,Min
,Max
Row/ColAgg*Op
Sum
,Min
,Max
,IdxMin
,IdxMax
Ew*Op
Abs
,Sqrt
,Exp
,Ln
,Sin
,Cos
,Floor
,Ceil
,Round
Add
,Sub
,Mul
,Div
,Pow
,Max
,Min
TransposeOp
A small example of a lowered kernel:
The input is converted to a MemRef and a result MemRef is allocated. The first Linalg GenericOp initialized the result MemRef by copying the first row of the input and the second GenericOp iterates over the remaining values and applies the aggregation operation - an addition in this case.
Known Limitations:
LoopFusionPass
below theLinalgToAffineLoopsPass
enables some loop fusions already, but it seems to cause issues with e.g.TransposeOp
. A simple example of this isX = [1,2,3](1,); print(t(X)); print(t(t(X)));
. Hence, loop fusion has not been moved down yet.Ew*Op
broadcasting for singleton matrices currently has no canonicalizer pass to always move the singleton matrix to be therhs
operand. This should be handled separately though to take broadcasting for C++ kernels into account as well. (see #803)MemRefType
is currently handled during conversion of the input Dense Matrix to a MemRef.RewriteToCallKernelOpPass
currently fails if IR containsmath.ipowi
or any trigonometric math op other thansin
andcos
, e.g.no kernels registered for operation 'ipowi'
. Hence, theewBinaryPow
test currently fails. Before merging this should be fixed or commented out. The same issue persists for the currently commented out lowering for trigonometric math opstan, asin, acos, atan, sinh, cosh, tanh
inEwOpsLowering.cpp
.