daphne-eu / daphne

DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines
Apache License 2.0
67 stars 62 forks source link

Possible bug in copy-on-write mechanics? #625

Closed divjakm closed 5 months ago

divjakm commented 1 year ago

I think I found a bug in the copy-on-write mechanics, as demonstrated by this code:

A = as.matrix<f64>(fill(1,3,3));  // 3 x 3 matrix filled with value 1
print("A before:");
print(A);

n = ncol(A);    // number of columns in A
//B = as.matrix<f64>(fill (0, n, n)); // INIT 1: B is new matrix with the same size as A, works correctly
B = A;                              // INIT 2: B is copy of A, possible wrong behavior?

for (i in 0 : n - 1)  // for every column of A
{
  //x = as.matrix<f64>(fill(3,n,1));  // VER 1: x = n x 1 column independent of A: works with both INIT variants
  x = sum(A,0);                     // VER 2: x = n x 1 column calculated from A (row sum): is wrong for INIT 2
  B[:,i] = x;  // store column x into B
}

print("B:");
print(B);   // B should be filled with value 3, is wrong for VER 2 and INIT 2
print("A after:");
print(A);   // A doesn't change in all versions, which is correct

Tha matrix A should never change, while the resulting matrix B should be filled with value 3 (as is in the case of INIT 1 code):

DenseMatrix(3x3, double)
3 3 3
3 3 3
3 3 3

However, in case of INIT 2 code B contains the following values:

DenseMatrix(3x3, double)
3 5 9
3 5 9
3 5 9

It seems that when calculating row sum of A the data is somehow modified by data previously stored in B?

divjakm commented 1 year ago

Additionaly, making an explicit copy of matrix A to initialize B like this:

B = copy(A); // INIT 3: B is copy of A

throws the following error:

JIT session error: Symbols not found: [ _copy__DenseMatrix_double__DenseMatrix_double ]
JIT-Engine invocation failed: Failed to materialize symbols: { (main, { _mlir_func-1-1, func-1-1, _mlir_ciface_main, _mlir__mlir_ciface_func-1-1, _mlir_main, _mlir_ciface_func-1-1, _mlir__mlir_ciface_main, main }) }Program aborted due to an unhandled Error:
Failed to materialize symbols: { (main, { _mlir_func-1-1, func-1-1, _mlir_ciface_main, _mlir__mlir_ciface_func-1-1, _mlir_main, _mlir_ciface_func-1-1, _mlir__mlir_ciface_main, main }) }
[error]: Got an abort signal from the execution engine. Most likely an exception in a shared library. Check logs!
[error]: Execution error: Returning from signal 6
pdamme commented 5 months ago

Thanks a lot for reporting this bug, @divjakm. And sorry for the late response. This issue must have escaped our attention somehow. Yes, I can confirm the described behavior (even with the current state of DAPHNE), and it is indeed a bug. You are right, A should be a 3x3 matrix filled with 1s and B should be a 3x3 matrix filled with 3s for every combination of {INIT1, INIT2} x {VER1, VER2}. We will further investigate and fix this...

pdamme commented 5 months ago

@divjakm: This bug is fixed now and we have a test case that is a slightly simplified variant of your example now.