Sparsity estimation for most DaphneIR ops

daphne-eu / daphne

DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines

Apache License 2.0

67 stars 62 forks source link

Sparsity estimation for most DaphneIR ops #766

Open pdamme opened 5 months ago

pdamme commented 5 months ago

In order to offer physical data independence to users, DAPHNE must be able to automatically decide the physical representation of a data object. In this context, the decision for a dense or a sparse representation of a matrix is especially crucial for linear algebra operations. This decision is based on sparsity estimates for the intermediate results. DAPHNE can already make this decision for some DaphneIR operations, but to support informed decisions throughout a larger program, we need sparsity estimation for the results of (virtually) all DaphneIR operations.

Hints:

For some general background on sparsity estimation, see Matthias Boehm's AMLS lecture at TU Berlin: 03_Compilation.pdf (slide 10ff).
Get familiar with MLIR traits and interfaces, since we use these concepts for type and property inference in the DAPHNE compiler.
Get familiar with type and property inference in the DAPHNE compiler.
- For a high-level overview: DAPHNE project deliverables D3.3 and D3.4.
- Traits and interfaces for the type, shape, sparsity, etc. can be found in src/ir/daphneir/: DaphneInfer*OpInterface.td/h/cpp and DaphneInfer*Traits.td.
- These traits and interfaces are attached to the DaphneIR ops in src/ir/daphneir/DaphneOps.td.
- The actual type and property inference happens in the inference pass (src/compiler/inference/InferencePass.cpp).
- Given a DaphneDSL script, the infered types and properties can be displayed by bin/daphne --explain property_inference myScript.daphne.
Think of straightforward ways to estimate the sparsity of the DaphneIR ops' results based on the type, shape, sparsity of their arguments. In many cases, it is not possible to know the exact sparsity based on this information. Nevertheless, for now, such naive metadata estimators are fine. Make simplifying assumptions where detailed information is missing, e.g., uniform distribution of the non-zeros within a matrix. Note that any type or property of the arguments may be unknown (usually represented by a value of -1. The sparsity estimation must produce meaningful results even in presence of unknowns. In the worst case, it should return unknown (-1) itself.
Implement sparsity estimation for most DaphneIR ops by either (a) attaching the sparsity estimation interface to the op and implementing a dedicated estimator, or (b) attaching one of the sparsity estimation traits to the op. The sparsity estimation traits we currently have may not be enough; if you identify additional frequently occurring cases, feel free to create additional traits for them.
Eventually, we would like to have at least a straightforward sparsity estimation for all DaphneIR ops. Some ops may be difficult to support, but we should at least support the ops in the following categories (see the section comments in DaphneOps.td) for now: data generation, matrix multiplication, elementwise unary/binary/ternary, row/column/cumulative aggregation, left/right indexing, reorganization, "other" matrix operations, CastOp, and data preprocessing. If individual operations turn out to be complicated, we can, of course, talk about them.
Sparsity estimation is only needed for matrix-typed results for now.

This issue is one particular aspect of #500.

Garic152 commented 5 months ago

I started working on this issue and so far I was able to come up with basic estimators for:

Unary
- diagMatrix()
- sliceCol()
- sliceRow()
Binary
- outerBinary
- addition/subtraction
Aggregation
- fullAgg()
- rowAgg()
- columnAgg()
Data Generation
- fill()
- rand()
- sample()
- seq()
Elementwise Unary
- Arithmetic/general math
- Trigonometric/Hyperbolic

I wrote down my thoughts and the estimators i came up with in an Overleaf Document (if there is a better way to share this, please let me know as I didn't find any alternatives)

Before working on the estimators themselves, i took some time to write a basic python script which plots the average estimation error for different sparsities and matrices of random sizes over many iterations. It only tests randomly distributed matrices for now, so if needed, I will write a more sophisticated script soon.

Before implementing the estimators, a short feedback or go signal would be great, because for some estimators I am not 100% sure if they are ready to use for Daphne.

I will update this comment as I add more operations, but you can always check the document overleaf for updates on my progress.

pdamme commented 5 months ago

Hi @Garic152, thanks for the update. An Overleaf document is okay for temporary notes. Once you implement the sparsity estimators (interfaces/traits), please add a condensed form of these thoughts as source code comments to convey the idea behind the concrete formulas (no need for diagrams there).

You don't need to invest much more time into the Python script, e.g., you don't need to support more than uniform random matrices.

Feel free to start implementing the estimators in DAPHNE, below you find some feedback on the individual ideas.

Regarding the concrete operations:

diagMatrix
- the if-then-else is not required, the special case is covered by the general case
- needs a special case for unknown input sparsity
- otherwise it looks good to me
sliceCol/sliceRow
- assuming S_out = S_in looks good to me (and already covers the case for unknown input sparsity)
binary operations
- outer binary
- you need to take the concrete binary op into account (e.g., add, mul, min, max, ...)
- addition/subtraction
- the typical naive formula for addition is: S_out = S_A + S_B – S_AS_B (think of the probability that a cell is a non-zero in A and/or* in B, ignoring occasional cancel-out for simplicity, similar to probability of the union)
- subtraction is essentially addition with negated B, so the same formula applies
- your formula S_out = S_A S_B is a naive estimate for multiplication (output cell is zero if it is zero in A and* B, similar to the probability of the intersection)
aggregation
- in general, also take the agg func into account
- full aggregation
- no need for sparsity estimation here, since they return a scalar
- row/column aggregation
- the assumption "result cell zero exactly when all cells in row/col are zero" does not hold for the agg funcs in DAPHNE, not for sum (-/+ could cancel out), mean (same as sum), var (think of repeated constant values), stddev (same as var), min (single zero suffices), max (same as min), idxMin (hard to tell), idxMax (same as idxMin)
- with the "exactly", it holds for sum and mean, though
data generation
- fill: LGTM
- rand: LGTM
- sample
- take withReplacement into account, since it determines if the output can contain multiple zeros or at most one
- seq: LGTM
elementwise unary
- arithmetic/general math
- abs, sign, sqrt: LGTM
- exp: LGTM
- ln: makes sense, so assume S_out = 1
  - other properties can and will be implemented in the future and can then be used to refine the estimation
- trogonometric/hyperbolic
- asin, atan, sinh, tanh: LGTM
- cosh: LGTM
- acos: okay, assume -1
- sin: okay, assume -1
- cos and other cases for sin: okay, assume -1

Garic152 commented 4 months ago

Thanks for the review! I will start implementing the first estimators for now, and correct the wrong estimators when I go back to the theorical stage.

Garic152 commented 4 months ago

I implemented all the elementwise unary operations as well as the sliceCol/Row function now, but encountered 2 smaller problems.

The first one is that neither sliceCol(new mat, old mat, int, int) nor sliceRow(new mat, old mat, int, int) are getting recognized as DSL builtin functions when using them in .daph scripts. I initially thought i was calling them incorrectly or with the wrong function name, but verified the function name in the test files and specifically got the error unknown built-in function: sliceCol, which also doesn't imply wrong arguments.

The second one is that I was using

std::vector<double> daphne::FillOp::inferSparsity() {
    auto arg = getArg();
    auto constantOp = arg.getDefiningOp<mlir::daphne::ConstantOp>();
    auto floatAttr = constantOp.getValue().dyn_cast<mlir::FloatAttr>();
    auto v = floatAttr.getValueAsDouble();

to extract the argument as a double in the fill op sparsity estimation in .../DaphneInferSparsityOpInterface.cpp, but this is only working for this one datatype now and is leading to pointer errors for other datatypes. I also tried working with dyn_cast<daphne::UnknownType> but this didn't work either after many tries. Is there a way to efficiently extract any scalar from getArg()?

I woul'd appreciate any kind of advice!

pdamme commented 4 months ago

Hi @Garic152,

The slice/extract as well as the insert operations are not generated through DaphneDSL built-in functions, but through right indexing and left indexing in DaphneDSL, respectively.
As extracting a compile-time constant from an mlir::Value is a frequent task in the DAPHNE compiler, we have some utility functions for that in src/compiler/utils/CompilerUtils.h, such as the variants of constantOfAnyType(), isConstant(), constantOrThrow(), and constantOrDefault(). Most of these are templates so if there is no specialization for the type you need yet, you can add it in src/compiler/utils/CompilerUtils.cpp. Keep in mind that any mlir::Value may not be backed by a compile-time constant (yet).

Garic152 commented 4 months ago

I now added all the functions you approved in my latest commit, but now some of the tests fail for a reason i can't really explain.

/daphne/test/api/cli/vectorized/VectorizedPipelineTest.cpp:51: FAILED:
  CHECK( generalizeDataTypes(outNN.str()) == generalizeDataTypes(outVR.str()) )
with expansion:
  "<SomeMatrix>(3x3, double)
  124 124 124
  124 124 124
  124 124 124
  "
  ==
  "[error]: Lowering pipeline error.
RewriteToCallKernelOpPass failed with the following message [
no kernel for operation `fill` available for the required input
  types `(f64, index, index)`

When manually testing x = fill(as.f64(124.0), 3, 3); with my implemented sparsity estimation though, everything works as expected, including the sparsity output, which I programmed to handle any scalar input type that is also supported by the fill function. Also, in cases where constantOfAnyType() can't extract a constantOp, the function returns -1.0.

Do you know if this is a mistake on my part in the new sparsity estimation function, or if the new sparsity estimates change the behavior of some of the existing tests?

After I am able to fix this issue, my plan was adding the tests for the currently exsting sparsity estimations so we have some working estimations and then continue to work on the next sparsity estimations/fix the ones you didn't approve.

Garic152 commented 4 months ago

I revisited this problem today and found out that I accidentally swapped the sparsities for the fill sparsity estimation, the tests for this operation don't fail anymore.

What still fails is

DecisionTreeRandomForestTest.cpp:47
{Unknown expression after the reported line}
due to unexpected exception with message:
  stod

I will continue writing the sparsity tests now, and will probably find a mistake in the sparsity estimates then, if there are any left.

Edit: For the tests, I added a new file in my recent push, as this made the most senese to have a good overview over the sparsity estimation tests for all operations. For the FillOp and SeqOp i decided to go for a TEMPLATE_PRODUCT_TEST_CASE, as the estimations also need to work for all supported scalars.

pdamme commented 4 months ago

Hi @Garic152, great that you've already solved the problem related to the VectorizedPipelineTest.

I had a look at the code in your branch Garic152:dev-issue-766.

Regarding the failing test cases (DecisionTreeRandomForestTest.cpp): It's unfortunate that the error output displayed here is not optimal, the root cause is not related to stod. When the error message displayed in a test run is not helpful, it is often a good idea to execute the test case separately:

bin/daphne test/api/cli/algorithms/decisionTreeRealData2.daphne data=\"test/data/wine/winequality-red-white.csv\" dt=1 maxV=1.0

[error]: Lowering pipeline error.InferencePass.cpp:498 -> Unsupported type for SeqOp sparsity inference
PassManager failed module lowering, responsible IR written to module_fail.log.

So the actual error happens in daphne::SeqOp::inferSparsity(). Having read your code, now I understand that a particular difficulty here is that the arguments of SeqOp could have various value types, but the value type determines how we extract the constant value. You could take inspiration from the shape inference of SeqOp (src/ir/daphneir/DaphneInferShapeOpInterface.cpp, daphne::SeqOp::inferNumRows()), where there is the same problem. Although the solution there is not 100% satisfactory, since it needs to enumerate all supported value types (or, just the most frequently used three).

Some additional remarks on your code:

InferSparsityInterface implementations:
- For some of the new interface implementations, you could use existing sparsity inference traits to simplify the code (see src/ir/daphneir/DaphneInferSparsityTraits.td), e.g.
- SparsityFromArg (e.g., for SliceColOp), see ReshapeOp in src/ir/daphneir/DaphneOps.td
- CompletelyDense (e.g., for EwExpOp)
- Feel free to introduce new traits for repeated interface implementations, e.g.
- SparsityUnknown (always return -1) (e.g., EwSinOp)

Regarding the sparsity test cases: It would be great to test the new sparsity estimators. I see two things one could test here: (1) if the sparsity is correctly estimated according the the simple formulas we use, and (2) how accurate the sparsity estimates are. I think your test cases go more into the direction of (2), since you execute the kernels and count the resulting non-zeros. However, that doesn't involve the sparsity estimators you implemented; but it could serve to experimentally double-check if the formulas make sense. However, I think (2) is out of scope for this issue/PR, and it would be better to concentrate on (1). We don't have an established way to test our type/property inference yet, but one option would be to introduce a new SparsityOp (akin to NumRowsOp) that gets a single matrix as argument and returns the compile-time sparsity estimate. To that end, the operation could get replaced by the argument's sparsity if the sparsity is known (by a canonicalization akin to mlir::daphne::NumRowsOp::canonicalize in src/ir/daphneir/DaphneDialect.cpp). For all cases where the sparsity remains unknown at compile-time, we could either replace the operation by a -1 constant in the end or by a runtime kernel that always returns -1.

PS: In general, feel free to open a draft PR to discuss concrete questions on your code. With a draft PR, it's unambiguous which code/branch you really mean ;) .

Garic152 commented 4 months ago

Thanks for the advice again!

I tried the type extraction method from src/ir/daphneir/DaphneInferShapeOpInterface.cpp, which fixed the remaining errors I mentioned.

For the traits, I wasn't 100% sure how to use them, but now that you clarified it, I implemented a SparsityRemains and a SparsityUnknown trait, which are able to replace all the currently existing EWUnary sparsity inference functions.

Regarding the testing, I will start working on a SparsityOp next then, which will also make the testing a lot easier I think.

Should I create a new branch/issue for the SparsityOp or should I just imlpement it as a part of this issue?

Update: The sparsity op and the adjusted tests are done already and in the draft-branch, but after creating a clean branch to rebase on main, I first have to fix some issues with the seq op

pdamme commented 4 months ago

Hi @Garic152, I've had a quick look at your draft-branch; the new SparsityOp looks good to me. Yes, a separate PR for this operation would make sense (but you don't need to write an issue for it). After that, you can use sparsity() in your test cases for this issue.

Garic152 commented 4 months ago

Would the SparsityOp as a single feature also need seperate testing? The only way I can think of testing it right now would be together with the sparsity estimation tests I started writing, as they cover all different cases for the SparsityOp.

pdamme commented 4 months ago

Yes, feel free to omit test cases in a PR for the SparsityOp. The test cases that you add for the sparsity estimation will be sufficient.

Garic152 commented 4 months ago

A short update: I have added my ideas for the next sparsity estimations in the Overleaf document.

This includes:

Outer Binary
Aggregation
Reorganization
Revision of sample()
Additions to Elementwise Unary
More functions in Other Operations

If you have the time, feel free to tell me if any of the new estimators would need a complete revision and i will implement the changes accordingly.

For the EWBinary, I should be able to just take the estimators of the Outer Binary operations if I'm not mistaken, as the Outer Binary estimators are based on elementwise probabilities, just like the EWBinary ops. (Edit: If we do that, for operations like EwAddor EwMul, we get an an average deviation of about 0.03, which i am not sure is good enough even for a basic sparsity estimation)

Also, for some operations, I wasn't 100% sure whether there was a naive sparsity estimator or not (which I also mentioned in the document). Before going back to them, I will add the code for the estimators I am most certain are correct.

Garic152 commented 3 months ago

I've now implemented sparsity estimation for the outer binary operations, but I've found that compareDaphneToRefSimple is really suboptimal for testing the functionality of the sparsity estimators, as there are many different cases that need to be checked, and if one of them fails, it's difficult to see where exactly the error came from. (An example can be seen here, even if i were to split the testing into multiple files for each operation category, it would still be too many tests in one file and the amount of files would quickly get out of hand as well).

Is there perhaps a better alternative for more structured script-level testing for this task? The kernel itself unfortunately can't be used for testing right now because its only function is to return a -1 if the canonicalization was not successful.

Garic152 commented 3 months ago

I have added sparsity estimation support for most daphne operations now in my latest commit, but encountered some issues with the EWBinary operations. The problem is that I didn't implement a way to see whether the sparsity estimation function reveices a scalar or matrix for the lhs or rhs, which leads to errors when using lhs/rhs.getSparsity();. A quick fix was using the following code for lhs and rhs:

auto lhs = getLhs().getType().dyn_cast<daphne::MatrixType>();
    double lhsSparsity = 0.0;
    if (lhs) {
        lhsSparsity = lhs.getSparsity();
    } else {
        lhsSparsity = -1.0;
    }

Is there any better way to handle scalars other than just assigning -1.0 to the sparsity? I unfortunately didn't find any suitable method inside of DaphneOps.cpp.inc which is why i went with this approach for now.

Garic152 commented 1 month ago

I managed to fix most of the errors with the above mentioned workaround but am still encountering one error with AlgorithmsTest.cpp file in the /daphne/test/api/cli/algorithms/ directory when running the tests.

   /daphne/test/api/cli/Utils.h:261: FAILED:
     CHECK( status == exp )
   with expansion:
     2 == 0

After some further investigation, the error lies in the following test

   ./bin/daphne --select-matrix-representations --args n=100,e=100 test/api/cli/algorithms/components.daphne

which executes components.daphne and then fails at line 22 with the following error message:

   [error]: Lowering pipeline error.{}
   PassManager failed module lowering, responsible IR written to module_fail.log.
   RewriteToCallKernelOpPass failed with the following message: [ no kernel for operation `ewMul` available for the required input types `(!daphne.Matrix<?x?xf64:rep[sparse]>, !daphne.Matrix<?x?xf64>)` and output types `(!daphne.Matrix<?x?xf64>)` ]
...
    | Source file -> "/daphne/test/api/cli/algorithms/components.daphne":22:21
    |
 22 |     u = max(aggMax(G * t(c), 0), c);
    |

I already tried various different methods of getting the lhs and rhs variables but the problem might be the matrix representation.

I would appreciate any kind of help with this problem, as it is the last problem before I can do a PR for #766.

philipportner commented 1 month ago

The error message indicates a missing instantiation for the types required for this specific kernel call. Looking at the src/runtime/local/kernels/EwBinaryMat.h kernel it seems like that combination is not yet implemented.

You'd have to add a kernel template instantiation that takes a CSRMatrix and DenseMatrix as inputs and outputs a DenseMatrix. Similar to how it is done here for the same input types but CSRMatrix as output type.

pdamme commented 1 month ago

Indeed, adding the missing kernel specialization would solve the immediate problem that makes the test case fail.

However, the root cause is a different one. As we can see in the error message, the critical operation in components.daphne is G * t(c) (line 22). Here, G is a sparse (n x n) matrix (adjancency matrix of a graph) and c is a dense (sparsity = 1.0) (n x 1) matrix. Representing the result of the elementwise multiplication as a DenseMatrix (as the missing kernel would do) could cause a prohibitive memory consumption (for large n). Thus, we should ensure that the DAPHNE compiler selects a sparse representation for the result of G * t(c).

On the main branch, the DAPHNE compiler selects a sparse representation for the result of G * t(c), but with the changes on your fork/branch (dev-issue-766), it no longer does. This can be double-checked by running:

./bin/daphne --select-matrix-representations --args n=100,e=100 --explain property_inference test/api/cli/algorithms/components.daphne

IR after inference:
module {
  func.func @main() {
    ...
    %15 = "daphne.seq"(%10, %3, %10) : (f64, f64, f64) -> !daphne.Matrix<100x1xf64:sp[1.000000e+00]>
    %16 = "daphne.cast"(%15) : (!daphne.Matrix<100x1xf64:sp[1.000000e+00]>) -> !daphne.Matrix<100x1xf64>
    %17:2 = scf.while (%arg0 = %9, %arg1 = %5, %arg2 = %16) : (si64, f64, !daphne.Matrix<100x1xf64>) -> (si64, !daphne.Matrix<100x1xf64>) {
      ...
      scf.condition(%22) %arg0, %arg2 : si64, !daphne.Matrix<100x1xf64>
    } do {
    ^bb0(%arg0: si64, %arg1: !daphne.Matrix<100x1xf64>):
      %18 = "daphne.transpose"(%arg1) : (!daphne.Matrix<100x1xf64>) -> !daphne.Matrix<1x100xf64>
      %19 = "daphne.ewMul"(%14, %18) : (!daphne.Matrix<100x100xf64:sp[1.990000e-02]>, !daphne.Matrix<1x100xf64>) -> !daphne.Matrix<100x100xf64>
      ...
      scf.yield %24, %23, %21 : si64, f64, !daphne.Matrix<100x1xf64>
    }
    ...
  }
}

The problem seems to be that the sparsity inference for EwAddOp changed (actually, the original one was a bit debatable as it returned the sparsity of one input if the other one's is unknown; I prefer the new variant over that). G doesn't change after its creation, but c changes in the loop. We need to find out why the sparsity of c is unknown inside the loop, when it is known to be 1.0 before the loop. I assume that the DAPHNE compiler thinks the sparsity of c changes inside the loop (which it actually doesn't) and sets it to unknown before the loop. Ideally, in the scf.yield the DAPHNE compiler should still know that c/%21 has a sparsity of 1.0. Then, the sparsity of c should be known inside the loop and, thus, the sparsity of both operands of EwMulOp should be known and we would know that the result of this op has the same sparsity as G and should be represented as sparse; hence, no need for a new kernel.

Garic152 commented 1 month ago

Thanks for all the advice!

Unfortunately, reverting the EWAddOp to its original state didn't fix the problem.

What I did find out however is that one of the reasons the test fails was due to the new implementation of the ewMax and aggMax sparsity estimation, which currently is using:

if(op->hasTrait<SparsityRemainsIfAllInputOneOrZero>()) {
...

This ensures that the output sparsity is set to 0 if both input sparsities are 0, and 1 if both are 1. For cases where the input sparsities are not both 0 or 1, the output sparsity remains unknown, like with u in this testcase.

I originally chose this implementation because my understanding is that for many matrix types, if the input sparsities aren't both 0 or 1, it will be extremely hard to estimate the output sparsity based on the input sparsity alone. This is also the reason why I didn't think it would be a good idea to assume randomly distributed matrices, because for this case it doesn't work well.

This will cause the whole while loop to work with unknown sparsities and therefore produce the wrong result.

If my assumption about the randomly distributed matrices not working was wrong, I will implement this into the code!

Edit: Adding an aggMax sparsity estimator for both row and column agggregation does fix part of the problem, but leads to another missing kernel for the maxRow(rep[sparse]) operation. Adding the ewMax sparsity estimator also partially fixes the error of the unknown sparsities, but introduces other similar test failures.