daphne-eu / daphne

DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines
Apache License 2.0
67 stars 62 forks source link

[DAPHNE-#523] DaphneDSL flexible matrix/frame literals #675

Closed AlexRTer closed 5 months ago

AlexRTer commented 8 months ago

This PR implements #523 and allows for matrix literal elements to be expressions of different value types instead of only literals with the same value type. Furthermore, matrix literal dimensions can now be set directly using parentheses ([...](dim_x, dim_y)) instead of having to call reshape. Lastly, it adds new syntax to create column-major as well as row-major frames in DaphneDSL. It also resolves #584 and #622.

Features:

Changes:

Limitations:

In the future, this could also be extended to support matrix literals where an element itself is a (flat) matrix by adding the matrix's element type to valueTypes and adding slice and rbind operations at the end of the buildColMatrixFromValues helper function.

AlexRTer commented 8 months ago

It appears there is an issue when running ./test.sh. Running ./test.sh ~"EwBinary*" passes all tests. However, so does ./test.sh "EwBinary*". Only when running all tests it seems to get stuck on line 35 of EwBinaryScalarTest.cpp which calls runDaphne, but I can't make out why.

AlexRTer commented 8 months ago

It appears there is an issue when running ./test.sh. Running ./test.sh ~"EwBinary*" passes all tests. However, so does ./test.sh "EwBinary*". Only when running all tests it seems to get stuck on line 35 of EwBinaryScalarTest.cpp which calls runDaphne, but I can't make out why.

Specifically, the runDaphne call gets stuck after calling runProgram in the same Utils.h file and reaching line 95 where it attempts to read from an stdout buffer. Nothing within the loop seems to be executed.

I still do not know why, but omitting the final ReshapeOp in the matrix literal visitor and returning colMatrix instead of its result does not cause it to get stuck and instead runs all tests successfully (except the new one specifically testing the reshaping). So the problem seems to stem from here:

mlir::Value result = static_cast<mlir::Value>(builder.create<mlir::daphne::ReshapeOp>(loc, utils.matrixOf(valueType), colMatrix, rows, cols));
pdamme commented 8 months ago

Thanks for this PR @AlexRTer. I can confirm the problems with the test cases, but I don't see the reason at first glance. I will look deeper into it.

AlexRTer commented 8 months ago

Thank you very much for your help

AlexRTer commented 6 months ago

I added some negative tests for matrix and frame literals.

Currently Row-major Frames allow for different column lengths, as long as the total amount of elements is a multiple of the amount of columns (the elements are simply split up evenly to all columns). For Column-major Frames, shape inference catches this case. Seeing as Column/Row -major notation is equivalent, perhaps it would be better to convert Row-major notation into Column-major notation before visiting the Frame expressions. Ideally, this would simplify Frame Literals to a single Visitor for both notations. Does Daphne currently have a way to do this?

It appears there is an issue when running ./test.sh. Running ./test.sh ~"EwBinary*" passes all tests. However, so does ./test.sh "EwBinary*". Only when running all tests it seems to get stuck on line 35 of EwBinaryScalarTest.cpp which calls runDaphne, but I can't make out why.

Thanks to c2900a3, the above issue has been resolved.

However, there seems to be an error with codegen/CodegenTest.cpp (https://github.com/daphne-eu/daphne/actions/runs/9034047874/job/24825710151#step:6:28) after the last commit that I cannot reproduce. The last commit 0e7090a only added a few matrix/frame tests and minor formatting changes. On the other hand, the tests beforehand (9bf8281) (https://github.com/daphne-eu/daphne/actions/runs/9033304615/job/24823355246) as well as my tests with the latest build both ran succesfully.

AlexRTer commented 6 months ago

Thanks for your feedback and editing @pdamme, I like the changes you made.

I implemented another parser rule like you suggested which fixes the issues with verifying frame literal row/column sizes, and allows expressions to be labels for frame literals of rows. Tests for matrix literals with one or no dimensions specified have also been added.

Before you merge it, I would like to refactor the code a bit and look into supporting the missing value types. Also, I still need to edit the documentation regarding support for expressions as labels.

AlexRTer commented 6 months ago

Matrix and Frame literals now support all value types mentioned in the documentation except for strings (i.e. si, ui with 64, 32, 8 bits, bool, f64, f32). I removed the corresponding failure test case, added tests for both matrices and frames to verify proper casting to these value types, and edited the documentation accordingly.

I added some kernel instantiations to kernels.json that are needed to support the MatrixConstant, CastScaObj, InsertRow and Reshape operations for matrices with these value types.

A remaining limitation for frame literals of rows is that rows themselves cannot be specified as expressions, unlike their elements. I am currently unaware of how to iterate over a MatrixConstant's values in the parser. This would make it possible to add support for row expressions in the frameRowMat visitor and would be preferable to creating InsertRowOps for all of its elements in my opinion.

If you think this could be left to a follow-up issue, you can merge this PR now if you would like to.