daphne-eu / daphne

DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines
Apache License 2.0
67 stars 62 forks source link

Bug related to deduplication of vectorized pipeline inputs #697

Closed GuilloteauQ closed 1 month ago

GuilloteauQ commented 7 months ago

I stumbled onto some error recently where the vectorized engine does not seem able to "guess" the dimensions of a matrix and then crashes.

Minimal (non) working example

# mwe.daph
N = 100000;
G = rand(N, N, 1.0, 1.0, 0.000001, -1);
p = fill(1.0, N, 1);

alpha = 0.85;
one_minus_alpha = 1.0 - alpha;

p = alpha * (G @ p) + one_minus_alpha * p;

print(p[0,0]);

Error message

When i run the above script without the vectorized engine: no problem

But with the vectorized engine (daphne --vec --select-matrix-repr mwe.daph) i get:

terminate called after throwing an instance of 'std::runtime_error'
  what():  EwBinaryMat(Dense) - lhs and rhs must either have the same dimensions, or one of them must be a row/column vector with the width/height of the other
./daphne-src/bin/daphne(+0x16e4e22)[0x5576b879de22]
/lib/x86_64-linux-gnu/libc.so.6(+0x43090)[0x2b06dbee8090]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x2b06dbee800b]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x2b06dbec7859]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0x9e8d1)[0x2b06dbbf78d1]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa37c)[0x2b06dbc0337c]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa3e7)[0x2b06dbc033e7]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa699)[0x2b06dbc03699]
/users/staff/dmi-dmi/guillo0001/daphnesched2/daphne-src/bin/../lib/libAllKernels.so(+0x72401c)[0x2b06f9dc401c]
[0x2b06da54c0ef]
/users/staff/dmi-dmi/guillo0001/daphnesched2/daphne-src/bin/../lib/libAllKernels.so(_ZN20CompiledPipelineTaskI11DenseMatrixIdEE7executeEjj+0x31f)[0x2b06f9ecdbef]
/users/staff/dmi-dmi/guillo0001/daphnesched2/daphne-src/bin/../lib/libAllKernels.so(+0x811b89)[0x2b06f9eb1b89]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xd6df4)[0x2b06dbc2fdf4]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609)[0x2b06da562609]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x2b06dbfc4353]
[error]: Got an abort signal from the execution engine. Most likely an exception in a shared library. Check logs!
[error]: Execution error: Returning from signal 6
Segmentation fault (core dumped)

The error message comes from: https://github.com/daphne-eu/daphne/blob/e7649ba6bdc629a03936e49c27ed876efbce6db4/src/runtime/local/kernels/EwBinaryMat.h#L103

IR

The IR given by daphne --vec -explain=vectorized --select-matrix-repr mwe.daph:

IR after vectorization:
module {
  func.func @main() {
    %0 = "daphne.constant"() {value = 1 : index} : () -> index
    %1 = "daphne.constant"() {value = 100000 : index} : () -> index
    %2 = "daphne.constant"() {value = 1 : si64} : () -> si64
    %3 = "daphne.constant"() {value = 0.15000000000000002 : f64} : () -> f64
    %4 = "daphne.constant"() {value = true} : () -> i1
    %5 = "daphne.constant"() {value = 0 : si64} : () -> si64
    %6 = "daphne.constant"() {value = false} : () -> i1
    %7 = "daphne.constant"() {value = 8.500000e-01 : f64} : () -> f64
    %8 = "daphne.constant"() {value = 1.000000e+00 : f64} : () -> f64
    %9 = "daphne.constant"() {value = 9.9999999999999995E-7 : f64} : () -> f64
    %10 = "daphne.constant"() {value = -1 : si64} : () -> si64
    %11 = "daphne.randMatrix"(%1, %1, %8, %8, %9, %10) : (index, index, f64, f64, f64, si64) -> !daphne.Matrix<100000x100000xf64:sp[9.9999999999999995E-7]:rep[sparse]>
    %12 = "daphne.fill"(%8, %1, %0) : (f64, index, index) -> !daphne.Matrix<100000x1xf64>
    %13 = "daphne.vectorizedPipeline"(%11, %12, %6, %3, %7, %1, %0) ({
    ^bb0(%arg0: !daphne.Matrix<?x100000xf64:sp[9.9999999999999995E-7]:rep[sparse]>, %arg1: !daphne.Matrix<100000x1xf64>, %arg2: i1, %arg3: f64, %arg4: f64):
      %16 = "daphne.matMul"(%arg0, %arg1, %arg2, %arg2) : (!daphne.Matrix<?x100000xf64:sp[9.9999999999999995E-7]:rep[sparse]>, !daphne.Matrix<100000x1xf64>, i1, i1) -> !daphne.Matrix<?x?xf64>
      %17 = "daphne.ewMul"(%arg1, %arg3) : (!daphne.Matrix<100000x1xf64>, f64) -> !daphne.Matrix<?x?xf64>
      %18 = "daphne.ewMul"(%16, %arg4) : (!daphne.Matrix<?x?xf64>, f64) -> !daphne.Matrix<?x?xf64>
      %19 = "daphne.ewAdd"(%18, %17) : (!daphne.Matrix<?x?xf64>, !daphne.Matrix<?x?xf64>) -> !daphne.Matrix<?x?xf64>
      "daphne.return"(%19) : (!daphne.Matrix<?x?xf64>) -> ()
    }, {
    }) {combines = [1], operand_segment_sizes = array<i32: 5, 1, 1, 0>, splits = [1, 0, 0, 0, 0]} : (!daphne.Matrix<100000x100000xf64:sp[9.9999999999999995E-7]:rep[sparse]>, !daphne.Matrix<100000x1xf64>, i1, f64, f64, index, index) -> !daphne.M
atrix<100000x1xf64>
    %14 = "daphne.sliceRow"(%13, %5, %2) : (!daphne.Matrix<100000x1xf64>, si64, si64) -> !daphne.Matrix<1x1xf64>
    %15 = "daphne.sliceCol"(%14, %5, %2) : (!daphne.Matrix<1x1xf64>, si64, si64) -> !daphne.Matrix<1x1xf64>
    "daphne.print"(%15, %4, %6) : (!daphne.Matrix<1x1xf64>, i1, i1) -> ()
    "daphne.return"() : () -> ()
  }
}

It seems that at %16 the dimension of the resulting vector should be known but isnt...

trick

for now, my "trick" is to reshape the vector to its actual size:

N = 100000;
G = rand(N, N, 1.0, 1.0, 0.000001, -1);
p = fill(1.0, N, 1);

alpha = 0.85;
one_minus_alpha = 1.0 - alpha;

- p = alpha * (G @ p) + one_minus_alpha * p;
+ p = alpha * (G @ p) + one_minus_alpha * reshape(p, N, 1);

print(p[0,0]);
pdamme commented 7 months ago

Thanks for reporting this bug, @GuilloteauQ. I can reproduce it.

The problem is not that some size (#rows/#cols) is unknown at compile-time. Such unknowns can always happen for various reasons and DAPHNE can usually tolerate that. As you correctly found out, the exception is thrown in the EwBinaryMat-kernel (elementwise binary operation on two matrices). The check in this kernel uses actual runtime shape information, i.e., it definitely knows the correct shapes of its inputs, because they have already been calculated at that point in time (no unknowns anymore).

Instead, the problem is related to the deduplication of pipeline inputs. Let's look at the following simpler example, which triggers the same bug:

N = 100;
G = fill(1.0, N, N);
p = fill(1.0, N, 1);
y = (G @ p) + p;
print(sum(y));

The expression (G @ p) + p gets vectorized (double-check with --explain vectorized). At runtime, the matrix G (lhs input to the mat mul) is split into row segments of size (k x N) (where k is determined by the runtime scheduler). Currently, the rhs input of mat mul is always broadcasted (sth we should anyway improve at some point, since it can be done more smartly), i.e., p is still (N x 1) in the pipeline. The result of the mat mul has shape (k x 1), and we want to add p to it next. This elementwise addition assumes that it works on row segments, which is true for its lhs input (result of the mat mul), but not for its rhs input (since p is broadcasted). At runtime, the elementwise addtition ends up to be (k x 1) + (N x 1), which is not valid. Hence the error.

The bug is in the canonicalization of VectorizedPipelineOp (see src/ir/daphneir/DaphneDialect.cpp: mlir::LogicalResult mlir::daphne::VectorizedPipelineOp::canonicalize(...)). This function deduplicates the pipeline inputs. If we comment out everything except for the return success(); at the end, the program runs fine even with --vec. A look at --explain vectorized reveals that %6, the SSA value corresponding to p, appears as a pipeline input twice ("daphne.vectorizedPipeline"(%5, %6, %3, %3, %6, ...)), but with different splits (splits = [1, 0, 0, 0, 1], 0: broadcast and 1: by-rows), and these different inputs (arg1, arg4) are used accordingly within the pipeline.

IR after vectorization:

module {
  func.func @main() {
    %0 = "daphne.constant"() {value = 1 : index} : () -> index
    %1 = "daphne.constant"() {value = 100 : index} : () -> index
    %2 = "daphne.constant"() {value = true} : () -> i1
    %3 = "daphne.constant"() {value = false} : () -> i1
    %4 = "daphne.constant"() {value = 1.000000e+00 : f64} : () -> f64
    %5 = "daphne.fill"(%4, %1, %1) : (f64, index, index) -> !daphne.Matrix<100x100xf64>
    %6 = "daphne.fill"(%4, %1, %0) : (f64, index, index) -> !daphne.Matrix<100x1xf64>
    %7:2 = "daphne.vectorizedPipeline"(%5, %6, %3, %3, %6, %1, %1, %0, %0) ({
    ^bb0(%arg0: !daphne.Matrix<?x100xf64>, %arg1: !daphne.Matrix<100x1xf64>, %arg2: i1, %arg3: i1, %arg4: !daphne.Matrix<?x1xf64>):
      %9 = "daphne.matMul"(%arg0, %arg1, %arg2, %arg3) : (!daphne.Matrix<?x100xf64>, !daphne.Matrix<100x1xf64>, i1, i1) -> !daphne.Matrix<?x?xf64>
      %10 = "daphne.ewAdd"(%9, %arg4) : (!daphne.Matrix<?x?xf64>, !daphne.Matrix<?x1xf64>) -> !daphne.Matrix<?x?xf64>
      "daphne.return"(%9, %10) : (!daphne.Matrix<?x?xf64>, !daphne.Matrix<?x?xf64>) -> ()
    }, {
    }) {combines = [1, 1], operand_segment_sizes = array<i32: 5, 2, 2, 0>, splits = [1, 0, 0, 0, 1]} : (!daphne.Matrix<100x100xf64>, !daphne.Matrix<100x1xf64>, i1, i1, !daphne.Matrix<100x1xf64>, index, index, index, index) -> (!daphne.Matrix<100x1xf64>, !daphne.Matrix<100x1xf64>)
    %8 = "daphne.sumAll"(%7#1) : (!daphne.Matrix<100x1xf64>) -> f64
    "daphne.print"(%8, %2, %3) : (f64, i1, i1) -> ()
    "daphne.return"() : () -> ()
  }
}

To solve this problem, the canonicalization of VectorizedPipelineOp must be fixed. I haven't looked into it in detail, but I assume it does not take the splits into account correctly. A duplicate input value may only be removed if both occurrences of the value are associated with the same split mode (broadcast or by-row). In that sense, I assume the problem is quite isolated to that function. Would you like to give fixing it a try, @GuilloteauQ?


By the way, the reason why the workaround with reshape() works is that ReshapeOp is currently not vectorized. Thus, there is no duplicate input to the pipeline, since the original p and the result of reshape(p) are technically different SSA values.

IR after vectorization:

module {
  func.func @main() {
    %0 = "daphne.constant"() {value = 1 : index} : () -> index
    %1 = "daphne.constant"() {value = 100 : index} : () -> index
    %2 = "daphne.constant"() {value = true} : () -> i1
    %3 = "daphne.constant"() {value = false} : () -> i1
    %4 = "daphne.constant"() {value = 1.000000e+00 : f64} : () -> f64
    %5 = "daphne.fill"(%4, %1, %1) : (f64, index, index) -> !daphne.Matrix<100x100xf64>
    %6 = "daphne.fill"(%4, %1, %0) : (f64, index, index) -> !daphne.Matrix<100x1xf64>
    %7 = "daphne.reshape"(%6, %1, %0) : (!daphne.Matrix<100x1xf64>, index, index) -> !daphne.Matrix<100x1xf64>
    %8:2 = "daphne.vectorizedPipeline"(%5, %6, %3, %3, %7, %1, %1, %0, %0) ({
    ^bb0(%arg0: !daphne.Matrix<?x100xf64>, %arg1: !daphne.Matrix<100x1xf64>, %arg2: i1, %arg3: i1, %arg4: !daphne.Matrix<?x1xf64>):
      %10 = "daphne.matMul"(%arg0, %arg1, %arg2, %arg3) : (!daphne.Matrix<?x100xf64>, !daphne.Matrix<100x1xf64>, i1, i1) -> !daphne.Matrix<?x?xf64>
      %11 = "daphne.ewAdd"(%10, %arg4) : (!daphne.Matrix<?x?xf64>, !daphne.Matrix<?x1xf64>) -> !daphne.Matrix<?x?xf64>
      "daphne.return"(%10, %11) : (!daphne.Matrix<?x?xf64>, !daphne.Matrix<?x?xf64>) -> ()
    }, {
    }) {combines = [1, 1], operand_segment_sizes = array<i32: 5, 2, 2, 0>, splits = [1, 0, 0, 0, 1]} : (!daphne.Matrix<100x100xf64>, !daphne.Matrix<100x1xf64>, i1, i1, !daphne.Matrix<100x1xf64>, index, index, index, index) -> (!daphne.Matrix<100x1xf64>, !daphne.Matrix<100x1xf64>)
    %9 = "daphne.sumAll"(%8#1) : (!daphne.Matrix<100x1xf64>) -> f64
    "daphne.print"(%9, %2, %3) : (f64, i1, i1) -> ()
    "daphne.return"() : () -> ()
  }
}
GuilloteauQ commented 7 months ago

Thank you @pdamme for the detailed explanation of the real problem 🙂 I can give it a try, yes! But that would only be in a few weeks (~mid-May I reckon)