Closed GuilloteauQ closed 1 month ago
Thanks for reporting this bug, @GuilloteauQ. I can reproduce it.
The problem is not that some size (#rows/#cols) is unknown at compile-time. Such unknowns can always happen for various reasons and DAPHNE can usually tolerate that. As you correctly found out, the exception is thrown in the EwBinaryMat
-kernel (elementwise binary operation on two matrices). The check in this kernel uses actual runtime shape information, i.e., it definitely knows the correct shapes of its inputs, because they have already been calculated at that point in time (no unknowns anymore).
Instead, the problem is related to the deduplication of pipeline inputs. Let's look at the following simpler example, which triggers the same bug:
N = 100;
G = fill(1.0, N, N);
p = fill(1.0, N, 1);
y = (G @ p) + p;
print(sum(y));
The expression (G @ p) + p
gets vectorized (double-check with --explain vectorized
). At runtime, the matrix G
(lhs input to the mat mul) is split into row segments of size (k x N) (where k is determined by the runtime scheduler). Currently, the rhs input of mat mul is always broadcasted (sth we should anyway improve at some point, since it can be done more smartly), i.e., p
is still (N x 1) in the pipeline. The result of the mat mul has shape (k x 1), and we want to add p
to it next. This elementwise addition assumes that it works on row segments, which is true for its lhs input (result of the mat mul), but not for its rhs input (since p
is broadcasted). At runtime, the elementwise addtition ends up to be (k x 1) +
(N x 1), which is not valid. Hence the error.
The bug is in the canonicalization of VectorizedPipelineOp
(see src/ir/daphneir/DaphneDialect.cpp
: mlir::LogicalResult mlir::daphne::VectorizedPipelineOp::canonicalize(...)
). This function deduplicates the pipeline inputs. If we comment out everything except for the return success();
at the end, the program runs fine even with --vec
. A look at --explain vectorized
reveals that %6
, the SSA value corresponding to p
, appears as a pipeline input twice ("daphne.vectorizedPipeline"(%5, %6, %3, %3, %6, ...)
), but with different splits (splits = [1, 0, 0, 0, 1]
, 0
: broadcast and 1
: by-rows), and these different inputs (arg1
, arg4
) are used accordingly within the pipeline.
IR after vectorization:
module {
func.func @main() {
%0 = "daphne.constant"() {value = 1 : index} : () -> index
%1 = "daphne.constant"() {value = 100 : index} : () -> index
%2 = "daphne.constant"() {value = true} : () -> i1
%3 = "daphne.constant"() {value = false} : () -> i1
%4 = "daphne.constant"() {value = 1.000000e+00 : f64} : () -> f64
%5 = "daphne.fill"(%4, %1, %1) : (f64, index, index) -> !daphne.Matrix<100x100xf64>
%6 = "daphne.fill"(%4, %1, %0) : (f64, index, index) -> !daphne.Matrix<100x1xf64>
%7:2 = "daphne.vectorizedPipeline"(%5, %6, %3, %3, %6, %1, %1, %0, %0) ({
^bb0(%arg0: !daphne.Matrix<?x100xf64>, %arg1: !daphne.Matrix<100x1xf64>, %arg2: i1, %arg3: i1, %arg4: !daphne.Matrix<?x1xf64>):
%9 = "daphne.matMul"(%arg0, %arg1, %arg2, %arg3) : (!daphne.Matrix<?x100xf64>, !daphne.Matrix<100x1xf64>, i1, i1) -> !daphne.Matrix<?x?xf64>
%10 = "daphne.ewAdd"(%9, %arg4) : (!daphne.Matrix<?x?xf64>, !daphne.Matrix<?x1xf64>) -> !daphne.Matrix<?x?xf64>
"daphne.return"(%9, %10) : (!daphne.Matrix<?x?xf64>, !daphne.Matrix<?x?xf64>) -> ()
}, {
}) {combines = [1, 1], operand_segment_sizes = array<i32: 5, 2, 2, 0>, splits = [1, 0, 0, 0, 1]} : (!daphne.Matrix<100x100xf64>, !daphne.Matrix<100x1xf64>, i1, i1, !daphne.Matrix<100x1xf64>, index, index, index, index) -> (!daphne.Matrix<100x1xf64>, !daphne.Matrix<100x1xf64>)
%8 = "daphne.sumAll"(%7#1) : (!daphne.Matrix<100x1xf64>) -> f64
"daphne.print"(%8, %2, %3) : (f64, i1, i1) -> ()
"daphne.return"() : () -> ()
}
}
To solve this problem, the canonicalization of VectorizedPipelineOp
must be fixed. I haven't looked into it in detail, but I assume it does not take the splits into account correctly. A duplicate input value may only be removed if both occurrences of the value are associated with the same split mode (broadcast or by-row). In that sense, I assume the problem is quite isolated to that function. Would you like to give fixing it a try, @GuilloteauQ?
By the way, the reason why the workaround with reshape()
works is that ReshapeOp
is currently not vectorized. Thus, there is no duplicate input to the pipeline, since the original p
and the result of reshape(p)
are technically different SSA values.
IR after vectorization:
module {
func.func @main() {
%0 = "daphne.constant"() {value = 1 : index} : () -> index
%1 = "daphne.constant"() {value = 100 : index} : () -> index
%2 = "daphne.constant"() {value = true} : () -> i1
%3 = "daphne.constant"() {value = false} : () -> i1
%4 = "daphne.constant"() {value = 1.000000e+00 : f64} : () -> f64
%5 = "daphne.fill"(%4, %1, %1) : (f64, index, index) -> !daphne.Matrix<100x100xf64>
%6 = "daphne.fill"(%4, %1, %0) : (f64, index, index) -> !daphne.Matrix<100x1xf64>
%7 = "daphne.reshape"(%6, %1, %0) : (!daphne.Matrix<100x1xf64>, index, index) -> !daphne.Matrix<100x1xf64>
%8:2 = "daphne.vectorizedPipeline"(%5, %6, %3, %3, %7, %1, %1, %0, %0) ({
^bb0(%arg0: !daphne.Matrix<?x100xf64>, %arg1: !daphne.Matrix<100x1xf64>, %arg2: i1, %arg3: i1, %arg4: !daphne.Matrix<?x1xf64>):
%10 = "daphne.matMul"(%arg0, %arg1, %arg2, %arg3) : (!daphne.Matrix<?x100xf64>, !daphne.Matrix<100x1xf64>, i1, i1) -> !daphne.Matrix<?x?xf64>
%11 = "daphne.ewAdd"(%10, %arg4) : (!daphne.Matrix<?x?xf64>, !daphne.Matrix<?x1xf64>) -> !daphne.Matrix<?x?xf64>
"daphne.return"(%10, %11) : (!daphne.Matrix<?x?xf64>, !daphne.Matrix<?x?xf64>) -> ()
}, {
}) {combines = [1, 1], operand_segment_sizes = array<i32: 5, 2, 2, 0>, splits = [1, 0, 0, 0, 1]} : (!daphne.Matrix<100x100xf64>, !daphne.Matrix<100x1xf64>, i1, i1, !daphne.Matrix<100x1xf64>, index, index, index, index) -> (!daphne.Matrix<100x1xf64>, !daphne.Matrix<100x1xf64>)
%9 = "daphne.sumAll"(%8#1) : (!daphne.Matrix<100x1xf64>) -> f64
"daphne.print"(%9, %2, %3) : (f64, i1, i1) -> ()
"daphne.return"() : () -> ()
}
}
Thank you @pdamme for the detailed explanation of the real problem 🙂 I can give it a try, yes! But that would only be in a few weeks (~mid-May I reckon)
I stumbled onto some error recently where the vectorized engine does not seem able to "guess" the dimensions of a matrix and then crashes.
Minimal (non) working example
Error message
When i run the above script without the vectorized engine: no problem
But with the vectorized engine (
daphne --vec --select-matrix-repr mwe.daph
) i get:The error message comes from: https://github.com/daphne-eu/daphne/blob/e7649ba6bdc629a03936e49c27ed876efbce6db4/src/runtime/local/kernels/EwBinaryMat.h#L103
IR
The IR given by
daphne --vec -explain=vectorized --select-matrix-repr mwe.daph
:It seems that at
%16
the dimension of the resulting vector should be known but isnt...trick
for now, my "trick" is to
reshape
the vector to its actual size: