daphne-eu / daphne

DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines
Apache License 2.0
67 stars 64 forks source link

Connected Components fails when reading an input matrix. #617

Open aristotelis96 opened 1 year ago

aristotelis96 commented 1 year ago

Setup

scripts/algorithms/components.daph generates a random matrix and runs the connected components algorithm.

Instead of generating a matrix, we can also use an existing matrix which can be read using the readMatrix operation. We only need to comment-out the random-generated input and use the readMatrix op:

# read sparse graph 
G = readMatrix($G);
# n = $n;
# e = $e;
# UT = upperTri(rand(n, n, 1.0, 1.0, 2.0*e/n^2.0, -1), false, false);
# G = UT + t(UT);

Example

./bin/daphne scripts/algorithms/components.daph G=\"amazon.mtx\" 

Issue

However this causes daphne to crash with seg fault:

daphne: /home/avontzalidis/prototype/thirdparty/installed/include/mlir/Pass/Pass.h:169: mlir::detail::PassExecutionState& mlir::Pass::getPassState(): Assertion `passState && "pass state was never initialized"' failed.
Segmentation fault (core dumped)

By re-running multiple times I was able to get a more helpful error (once every ~50 tries):

daphne: /home/avontzalidis/prototype/thirdparty/installed/include/mlir/Pass/Pass.h:169: mlir::detail::PassExecutionState& mlir::Pass::getPassState(): Assertion `passState && "pass state was never initialized"' failed.
[error]: Exception in /home/avontzalidis/prototype/src/compiler/inference/InferencePass.cpp:165: 
type inference returned an unknown result type for some op, but partial inference is not allowed at this point: scf.while
[error]: Got an abort signal from the execution engine. Most likely an exception in a shared library. Check logs!
Segmentation fault (core dumped)

Notes

  1. We should have an additional test case for the readMatrix version of connected components, when this is fixed.
  2. Before commit 2fc8a9ee572be441ffc72528bb1ec20f7ee7f478 this used to work.
  3. A quick fix that makes the script work again is to replace the while loop with a for loop. Of course this means you are not able to terminate the execution when diff==0 and instead you need to iterate over a specified amount of iterations:
    # while( as.si64(diff > 0.0) && (maxi==0 || iter<=maxi) ) {
    for (iter in 1:maxi) {
    ...
    # iter = iter + 1; # Remove this
    }
MarcusParadies commented 1 year ago

Thanks, I can confirm this. Sadly, there is even a test components_read.daphne, which is just not integrated into the test suite. I'll dig into this a bit further.

corepointer commented 1 year ago

I also ran into this and I believe I have a fix for this (the "returned an unknown result type" error). I'll pull that out of my wip into a separate commit later today and let you have a look at it :eyes:

MarcusParadies commented 1 year ago

I didn't start looking into this yet, so please go ahead!

aristotelis96 commented 1 year ago

I noticed that pca.daph does not seem to work either, it crashes with segmentation fault. Unfortunately I was not able to get a more useful error message.

PCA example that used to work before 2fc8a9e:

bin/daphne scripts/algorithms/pca.daph X=\"data/wine.csv\" K=2 center=true scale=false Xout=\"outX.csv\" Mout=\"outM.csv\"
corepointer commented 1 year ago

I fixed the pca.daph issue (see #637). The original issue with components_read.daph is still pending as the cure that I have for this problem is not working in all cases I tested.