RESEARCH: Supporting column-major matrices

AtheMathmo commented 8 years ago

Gulp...

Is this worth doing?

It will allow us to integrate more nicely with BLAS/LAPACK - and we get things like free transposition as a result. We currently have a lot of the framework to do this but will need to make a lot of changes - especially to the decomposition algorithms and other areas optimized for row-major.

Note also that our iter_rows usage will become difficult...

This issue should track potential pitfalls and criteria. We'll make a new issue to track the actual implementation if it happens.

tafia commented 8 years ago

I was thinking about it too. For now I don't think it is needed. I see two alternatives:

easiest: have a fn which does transpose and multiplication at the same time, without allocating a transposed matrix (I think this is the most frequent need of transpose, and we just need to iter rows on both sides)
or not really complicated either: have a dedicated struct which owns/borrows a Matrix but swaps all row/cols operations, without allocating new data.

AtheMathmo commented 8 years ago

I'd been thinking about the first alternative too - this is indeed the most common use case and may be worth exposing. However, I don't think it is a clean solution long term.

I'm glad to hear that you think it isn't needed for now. I've been working on a blog post about in-place matrix transposition and it made me feel like maybe it would be nice to have sooner rather than later.

(Draft of post in question)

tafia commented 8 years ago

I had no idea it was that complicated!

However, I don't think it is a clean solution long term.

Why not? It looks like the NOR function to me. It looks more complicated but it is in fact closer to what is actually happening.

AtheMathmo commented 8 years ago

I meant the first solution only - in that case I think it would preferable to just support column-major/stride wrapping across all operations instead of having a single exception for transpose multiplication. (In terms of implementation this would also require A^T * B and A * B^T for all combinations of Matrix - MatrixSlice and MatrixSliceMut.)

The second alternative seems like it may be enough - but I'm not sure that the complexity wouldn't be much more than just supporting column major directly.

It is possible I am misunderstanding what you're suggesting - I'd be very happy if it truly wasn't all that complicated to support these efficiently.

c410-f3r commented 8 years ago

What I did in CSC/CSR sparse matrices is more or less what @tafia said about creating a new struct and swapping cols/rows.

If a new struct is going to be created and the BaseMatrix trait will be modified, things could look like this: tree

Anyway, it is just a idea.

tafia commented 8 years ago

This looks great indeed.

Question is to know whether we can easily implement specialization when needed.

AtheMathmo commented 8 years ago

The real concern for me is how we handle the decomposition algorithms for both row-major and column-major. I think the only real solution is writing them twice - we can't really generalize over the access patterns.

I agree that @c410-f3r looks good to me but I'm not sure it's something we can put on the road map yet. I think Sparse matrices need some time to settle (after merging of course) before we start taking emphasis off of dense linear algebra, if that makes sense? Either way, thanks for the comment @c410-f3r! I think if we move forward with this we would use a similar approach to what you suggest.

Andlon commented 8 years ago

@AtheMathmo: I think with decompositions, you don't need to write the algorithm twice. I can think of two alternatives. In both cases, the decomposition algorithm is written exactly once, for only one of the storage orders.

If the decomposition is attempted on a matrix of an incompatible storage order, it panics or fails at compile time, so the user must first copy the matrix into the right storage order. We would of course provide convenient functions to easily do so, i.e. mat().to_row_major().qr(), for example.
If the storage order does not match, simply copy the matrix into the right storage order. Note that pretty much all compositions are O(n^3) and copying is basically the fastest O(n^2) algorithm there is, so the overhead should still be negligible in most cases. In this case, we should still document what storage order the decomposition is optimized for, and perhaps provide a section in the documentation labelled "Performance tips" or so.

From a user experience perspective, option 2 is a clear winner. I think ideally we don't want our users to have to worry about storage order unless they have to. In other words, I think worrying about storage order should be opt-in, letting our users focus on the actual linear algebra until they really find themselves in a situation where they truly need to squeeze out maximum performance (which I bet few of our users actually will in the end).

Also, I agree that this would require some pretty drastic changes, and I think we haven't yet learned enough to put it on the road map, as @AtheMathmo points out. However, I think it's great that we get some more contribution to this issue! Thanks for your insights from your work on sparse matrices, @c410-f3r, they are really valuable.

AtheMathmo / rulinalg

RESEARCH: Supporting column-major matrices #26