Interoperability and completeness of special matrix types

JuliaLang / julia

The Julia Programming Language

https://julialang.org/

MIT License

45.03k stars 5.43k forks source link

Interoperability and completeness of special matrix types #8001

Closed tkelman closed 5 years ago

tkelman commented 9 years ago

At present there are a variety of special structured matrix types - diagonal, bidiagonal, tridiagonal, symtridiagonal, banded, triangular, hessenberg, symmetric, hermitian, sparse CSC, and so on - in base Julia. The completeness and performance of these implementations varies widely, since they've been implemented piecemeal and as needed by @viralbshah, @andreasnoack, @jiahao, and many others.

Managing this diversity in an effective way is a hard problem, there are only a handful of well-engineered systems in this area. I hope Julia can improve here -

Can we make operations between specialized array types respect all of the available structure as efficiently as possible, without defaulting to generic operations that are unacceptably slow?
Do we need higher-level abstractions that might be difficult to express within Julia's type system?
How can we flesh out the completeness of specialized array types without writing a combinatorial amount of new code?
How do we make it possible for users to express their own problem-specific structure and generate and/or tie into efficient implementations of all the operations they might want to do?
Is there an analog to the conversion/promotion mechanism that extends to linear algebra on structured array types, dense or sparse or block sparse or ...

There's a lot to this so it's probably beyond the scope of 0.4. Many of these specialized array types do not need to be in base long-term, but they should be default and ideally precompiled packages. Moving things out to packages too early, especially before the right frameworks are in place to support good interoperability, could in the worst case lead to fragmentation and lower visibility. I also very badly want to narrow the divide between dense linear algebra and sparse linear algebra here, as the latter is a bit of a bastard stepchild in the current system.

I don't have any concrete proposals here yet, but I think we should open a discussion. The challenges are very different, but I look at how the JuliaOpt team has designed MathProgBase for example, the hourglass architecture has done an amazing job of tying together a diverse set of solvers to multiple modeling environments. Can we similarly identify the narrow waists and common protocols for structured linear algebra?

The near-future implementation of CSR sparse (for which there's already an initial PR, #7029) and arbitrary-dimensional COO sparse might make a good proving ground to start coming up with ideas here.

TL;DR: we have lots of array types, how do we make them work together better?

jiahao commented 9 years ago

Some thoughts:

We should limit this discussion to rank-2 (matrices or matrixlike objects) and not the general rank-k array case. There is too much that doesn't generalize beyond rank 2.
Let's distinguish between sparse and dense array types. It's important because there is pretty limited understanding of the interaction between sparse matrix data structures and sparse matrix algorithms (at least, much less so than in the dense case).
To the extent that special matrices enable dispatch on specialized methods for linear solvers and eigensystem computations, etc., we'd eventually want complete representation of matrix types supported by LAPACK.
It is useful to distinguish between "structured matrices" - matrices with special structure like Diagonal or SymTridiagonal - and "sparse matrices" which have more complicated/general sparsity patterns.
There are two broad categories of structured matrices. First are Diagonal, Bidiagonal, SymTridiagonal, Tridiagonal, etc., which can be considered special cases of banded matrices, which we currently don't represent in Julia. (LAPACK type GB for general banded; also the more exotic types PB for symmetric positive definite banded and TB for triangular banded.)
The second broad category of structured matrices encapsulate matrices with known global symmetries like Symmetric or Hermitian. These special matrix types are really thin wrappers around dense matrices; the types are used primarily for dispatch rather than to really encapsulate a different kind of storage layout from Array{T,2}. This category of special matrices also have multiple representations; in particular, LAPACK distinguishes between conventional storage and packed storage formats:

special matrix conventional packed

complex symmetric SY SP

symmetric/Hermitian positive definite PO PP

symmetric/Hermitian indefinite SY/HE SP/HP

triangular TR TP

In theory, packed matrices offer more compact memory usage. It's not clear, however, that this always translates to more performant routines over packed matrix formats relative to their conventional dense storage counterparts.
A large part of why sparse linear algebra is the "bastard stepchild" is reflective of the general lack of consensus of what sparse matrices are and how to design efficient algorithms that work on them. There are quite a few sparse matrix formats and supporting a complete set of functionality and interop between the formats is very painful.

special matrix	conventional	packed
complex symmetric	`SY`	`SP`
symmetric/Hermitian positive definite	`PO`	`PP`
symmetric/Hermitian indefinite	`SY`/`HE`	`SP`/`HP`
triangular	`TR`	`TP`

tkelman commented 9 years ago

Sure. Aside from the note that COO sparse can be generalized to higher dimensions, the rest of the discussion here should be about matrix-like operators.
When it comes to low-level implementation, of course sparse and dense are very different. What I want to avoid is designing the entire system in a way where generic operations are fundamentally broken for sparse matrices. We can't just ignore them.
That's one fairly narrow measure of "completeness" and a good goal, but we should aim higher.
Part of my point is these structural properties and sparsity patterns compose and interact in myriad ways that we want to be able to represent. e.g. my matrix is block tridiagonal with repeated sparse symmetric quasidefinite blocks (not making that up at all btw, this is the exact type of problem I most frequently solve). In an ideal world I shouldn't have to rewrite half of the LinAlg module to make this matrix type work efficiently, seamlessly, generically.
Or for example SB symmetric banded, where LAPACK has only eigenproblems, no factorizations at all - Linda Kaufman has a relatively-recently-published (2007) algorithm for doing exactly this. This is the kind of active research that Julia can be faster about implementing than LAPACK can. And there's no reason to only talk about dense here, all of these structural properties apply equally in the sparse case.
Again, symmetry is not a dense-only property in any way. Let's not put it into Julia as if it is.
Very true. There are several other sets of tools out there that are much further along this path than Julia, and as a result are vastly superior platforms in which to do exactly the research that's required on sparse formats and algorithms. Rather than ignoring it because it's too hard, let's consider what can be done to make the work easier in Julia. Can we make Julia a world-class tool in which to do cutting edge sparse linear algebra research? I want it to be, I would hope others do too, but right now it very much isn't.

jiahao commented 9 years ago

symmetry is not a dense-only property in any way. Let's not put it into Julia as if it is.

True. @andreasnoackjensen just merged #7992, which makes the symmetric matrix types parametric on other matrix types.

I didn't mean to imply that sparse matrix support is second-class or unimportant. It's just a messier problem and I don't know a good strategy going forward.

simonbyrne commented 9 years ago

I've been thinking about this recently in terms of matrix multiplication (particularly for #6837): the key problem here is the quadratic explosion of methods (every additional type has to implement multiplication with every other type). Is there a generic way we could implement matrix multiplication so that only O(1) additional functions need to be defined?

At their core, all the multiplication methods basically differ in the way by which they iterate over pairs of elements from each matrix, (column-wise, row-wise, block-wise, etc.). Is there a way we could define these various ways to iterate for each matrix type, along with an associated "cost" (i.e. CSC is cheap by column, very expensive by row), and then the multiplication could just choose the most efficient iteration scheme?

andreasnoack commented 9 years ago

@tkelman Thanks for opening this issue. Much of the development of the linear algebra functionality has been organic and hasn't followed a master plan. When Symmetric was introduced, it was a way to dispatch to the symmetric eigen solver directly and only later on the broader usefulness of a Symmetric type was recognized which eventually made the type develop into the version merged this morning where it is parametrized on the matrix type. I am working toward making the linear algebra more generic such that the types don't require a specific type. Now Triangular and Symmetric/Hermitian allow all kinds of AbstractMatrix types and I plan to extend this to the other structure types such that you can have e.g. a Diagonal with matrix elements as a way of storing a block diagonal matrix. Hopefully this is in line with some of what you are asking fore.

It is a problem that the potential number of methods grows so fast in the number of types. I don't think it is feasible to support all possible combinations. Right now some of the types try to cover more broadly by promoting explicitly to dense matrices. It is possible that this could be done in a more automatic way, but I haven't figured out how and actually I'm not sure if we really want it, because a no method error might be better than a huge sparse matrix getting promoted to a dense matrix implicitly. At least temporarily, we might want to force the user to be explicit about the order of evaluation of expression involving special matrices, e.g. something like A*(B*x) to avoid the matrix-matrix multiplication between two very complicated matrices.

You are more experienced with sparse matrices than I am so please continue to share you thoughts on these issues. There is still plenty work to do.

dmbates commented 9 years ago

I definitely appreciate Symmetric/Hermitian and Triangular being templated on the matrix type. I must admit that when I first saw the type of matrix as one of the template parameters I was a bit perplexed as to why this was done. I didn't think of sparse matrices although in retrospect it is a natural direction to extend these.

When solving sparse systems knowing that the system matrix is triangular is a big win and knowing that it is symmetric/Hermitian helps you choose the type of decomposition and representation. Some of the recent code from the Watson Sparse Matrix Package (WSMP) or HSL (what used to be called the Harwell Subroutine Library) allow for a Bunch-Kaufman decomposition in addition to a Cholesky decomposition. That is, it is not necessary to have a positive definite sparse symmetric matrix in order to take advantage of symmetry in solving sparse systems.

It may be necessary to enforce the underlying structure in a symmetric/Hermitian or Triangular sparse system so that unnecessary copying can be avoided. For example all the BLAS/LAPACK routines that apply to (non-packed) triangular matrices ignore the contents of the strict triangle not indicated by the uplo flag. For sparse matrices it may be necessary to ensure that the data member of the type is a lower/upper triangular matrix or to pay the price of copying through tril or triu. This would mean creating methods for arithmetic, etc. applied to symmetric matrices stored in only one triangle. WSMP allows two formats for symmetric sparse matrices, as a triangle stored in CSC format or as an MSC format in which the diagonal is stored separately as a dense vector and only the off-diagonals from one triangle are stored in CSC format.

tkelman commented 9 years ago

imply that sparse matrix support is second-class or unimportant

No, you didn't do that, the current state of Julia's standard library does that by itself :/

@simonbyrne I was thinking of something similar this morning, isrowiterable() or iscolumniterable() or isdiagiterable() etc. Transpose as view in the dense case, or as reinterpreting the same data between CSR vs CSC in the sparse case, should help start towards cleaning up the Ax_mul_Bx[!] system. It's not just matrix multiplication though, there are all the reductions and various other unary, binary, logical operators to worry about too.

Right now some of the types try to cover more broadly by promoting explicitly to dense matrices. It is possible that this could be done in a more automatic way, but I haven't figured out how and actually I'm not sure if we really want it, because a no method error might be better than a huge sparse matrix getting promoted to a dense matrix implicitly.

Right, I think the generics in base are not sufficiently generic for this reason. I shouldn't have to wait 2 minutes for mean(speye(10^5), 1), and min(speye(10^5), 1) shouldn't give a memory error. Some of this can be solved with more code, but that only solves one missing method at a time for one sparse type at a time. Are there ways we can make the generic implementations smarter (hopefully without hurting dense performance), working over iterators of indexes perhaps? You've probably heard the Mike Vanier quote "If I had a nickel for every time I've written for (i = 0; i < N; i++) in C"? I'm starting to feel the same way about for i = 1:A.n, j = A.colptr[i]:A.colptr[i+1]-1 in Julia.

it is not necessary to have a positive definite sparse symmetric matrix in order to take advantage of symmetry in solving sparse systems

In fact there are a number of applications in optimization where it is an absolute requirement to preserve symmetry with an indefinite matrix, in order to obtain the inertia of the matrix (number of positive, negative, and zero eigenvalues).

stevengj commented 9 years ago

I just ran into one of these limitations today. If A and B are SymTridiagonal, with B positive-definite, I wanted to use eigvals(A,B) to solve the generalized eigenproblem. There was no method. Then I tried eigs, and it failed for lack of an isposdef! method. Then I tried converting to sparse(A) as a fallback, but there is no such method. Finally, I gave up and just used full(A). Quite a frustrating experience.

At the very least, we should support sparse(A) for all of the special sparse (banded) formats.

[It would also be good to have them inherit from something like Banded (ala @jiahao's cleanup proposal) so that things like \ could convert them all to sparse as a fallback rather than operating on them as if they were dense matrices.]

jiahao commented 9 years ago

@ivanslapnicar just pointed out that A*B*C fails when all 3 are Tridiagonal, since this evaluates to *((A*B)::Matrix, C::Tridiagonal) which is not defined. In contrast A*B works fine and returns a dense Matrix. This is surprising behavior.

Update: Even worse, A*(B*C) works just fine, because we define an A_mul_B! method that handles *(::Tridiagonal, ::Matrix). So * is not associative. Epic fail.

tkelman commented 9 years ago

We don't have a type for quin(t?)diagonal matrices, so Tridiagonal*Tridiagonal returning dense doesn't surprise me. But it would probably surprise someone who wanted Julia to do things quickly for that operation (without specific knowledge of the internals).

jiahao commented 9 years ago

I did propose support for arbitrary banded matrices in #8240, so the specific issue I just mentioned can be addressed as part of the broader cleanup. However it does speak to the combinatorial growth problem.

tkelman commented 9 years ago

So, now that we have a parameterized Symmetric type, it's very quickly clear that Symmetric{SparseMatrixCSC} has frustratingly few methods defined.

tkelman commented 9 years ago

Leaving a breadcrumb to a bit of a discussion here https://github.com/JuliaLang/julia/pull/10902#issuecomment-94307668

I'm thinking that some of the CartesianIndex / eachindex infrastructure could maybe be adapted and specialized for sparse and other structured matrix types. Could be worth some experimentation and prototyping.

ViralBShah commented 5 years ago

These discussions have generally moved to discourse, and I am closing this issue since there isn't anything directly actionable here.