Extended semantics for Matrix_Dup and Vector_Dup

manoj63 commented 2 years ago

Premise

GraphBLAS implementations need to provide a choice for multiple data-structures of sparse and hyper sparse matrices and vectors to address performance concerns. CSR, CSC, COO, doubly indexed or doubly linked lists, etc. are some examples. The need to support Dynamic Graphs exacerbates the need.
Choice of the optimal representation in GraphBLAS C-API 3.0 timeframe will not be automatically discerned by a GraphBLAS runtime. The application programmer can provide hints to the GraphBLAS programmer about the optimal choice of the data structure based on his knowledge of the sparsity of the matrices and fill patterns.
The possibility of Rich Metadata associated with GraphBLAS matrices, a potential companion proposal is, highly synergistic with this proposal. The components of this Rich Metadata will comprise of data-structure of an GraphBLAS matrix or vector; run-time statistics of the operators, and in case of binary operators on GraphBLAS matrices, the left or right operands used with this GraphBLAS matrix or vector; and possibly information about the statistics on non-zero entries of the vectors or matrices.

Proposal: Main idea

This proposal targets GraphBLAS C-API 2.1. The main idea is to extend the signature of the Matrix_Dup and Vector_Dup instructions to include an additional and optional parameter (hint), the data-structure recommended by the application programmer. The main idea is to extend the signature of the Matrix_Dup and Vector_Dup instructions to include an additional and optional parameter, the data-structure recommended by the application programmer.

While the above extension is a succinct API call to change the data-structure underlying a GraphBLAS matrix or vector, there are additional or alternative approaches to achieve the same effect, namely

Use Matrix_new with the additional parameter or hint followed by a vector or matrix assign
Use Matrix_new followed with (GrB set?) with the additional parameter or hint followed by a vector or matrix assign

Problem Solved

Choice of data-structure for matrices and vectors optimized for the operation being performed on them, and for the sparsity and distribution of non-zero elements, the choice being dictated by application programmer input

Details to be worked out

The parameter values need to be standardized, i.e, the GraphBLAS standard needs to specify an ontology for the optional parameter values which establish the domain for the parameter values
Since the calls discussed above can also be made without the optional parameter, in which case the library makes the choice of the data-structure to be used, the application program must be able to query the data-structure being used
The application programmer should be able to query the data-structures permissible or supported in a GraphBLAS library
Are there optional parameters other than the data-structure to be used. Should this collection of parameters be integrated into a single descriptor? (Note that the term descriptor used in GraphBLAS API call signatures for a different purpose, so we need a name for this collection of optional parameters.)
Instead of embedding optional parameters in GraphBLAS API calls, shall we create an orthogonal channel for communication of pragmas from (or between) the application program and a GraphBLAS library.

DrTimothyAldenDavis commented 2 years ago

Lots of details to work out of course, but this is looking great. I think having several options would be best: adding the hint in matrix and vector dup is a great idea, since it allows the user application to make a copy of a matrix / vector but in a different (hint hint) format. I don't have this feature myself since I was trying to implement my hints with as little change to the API as possible, but this is an important feature to consider.

It would also be important to allow for a hint to be given for an existing matrix, like my current GxB_set methods.

I think the idea of the orthogonal channel for these hints would be a good idea, as best we can. Matrix / vector dup would need it in the API, perhaps in the descriptor (which could be added to the Matrix_dup and Vector_dup signatures).

I think adding hints like this to the GrB_Descriptor is great way to add these features.

mcmillan03 commented 2 years ago

The C++ API is also working on hints for all constructors and we are considering a number of orthogonal "dimensions" to these hints:

row vs. col
sparse vs. dense vs. hypersparse vs iso (or structure only) etc...
static vs. dynamic

These hints won't directly dictate the data structure.

rayegun commented 2 years ago

We should meet and discuss this between both committees then? Would be nice to converge on something at least close. We're having a very similar discussion, although we've been very wary of standardizing the hints. Maybe we're being too conservative, especially if they're just hints.

BenBrock commented 2 years ago

Here are the hints we've defined thus far in the C++ API. I'd be happy to chat about defining some set so we can have a rough correspondence between the C and C++.

These C++ hints are compile-time hints, but conceptually they're exactly the same thing as your hints (just your hints could be chosen based on runtime information).

BenBrock commented 2 years ago

Also, to provide some context around these hints, the idea is that they should be composed to indicate different sparse data structures.

grb::compose<grb::sparse, grb::row> -> potentially CSR grb::compose<grb::sparse, grb::column> -> potentially CSC grb::compose<grb::dense, grb::row> -> potentially dense row-major

You could do something similar in C, either by OR'ing them together GrB_SPARSE | GrB_ROW or by having your own compose. (Combinations of hints are more limited with OR'ing, at least in C.)

I am a big fan of standardizing hints. With non-standard hints, code becomes non-portable. If I write my code using SuiteSparse GraphBLAS and a bunch of SuiteSparse's non-standard hints, my code will no longer compile and run with other GraphBLAS implementations. My code is now ill-formed according to the GraphBLAS standard.

However, if I have standardized hints, code is portable between implementations. Hints in the C++ API do not cause any semantic change in the matrix or vector data structures. This makes them effectively optional for implementers. A conformant implementation need only compile and run with all the standard hints, as the hints do not cause any semantic change in the data structure. Good implementations should, however, document what the effect on performance might be of various hints.

DrTimothyAldenDavis commented 2 years ago

I have a simple composition of my hints: I can sum them. SPARSE + BITMAP + FULL means that the matrix can be either sparse, bitmap, or full, but not hypersparse. HYPERSPARSE + BITMAP means it can be hypersparse or bitmap (my choice inside GraphBLAS) but I am being told "hint: not sparse and not full".

It's just a hint of course. If I'm told "only FULL", then I treat that as "OK, how about make it BITMAP + FULL" because if an entry is missing, I can't populate with some value to force it to be full.

My hints cause no semantic change to the matrix or vector at all. This is very important. The hints are completely optional, and if I ignore them, or if they are not given to me at all, the results are the same. Just the internal data structure might differ, and the performance differs.

The same thing holds for CSR vs CSC. My default is CSR, but the user application can ignore that. Semantics are the same regardless of how the matrix or vector is stored: by row or by column. ONly difference is that I never change this myself, automatically, because it would be a very tricky heuristic.

rayegun commented 2 years ago

I think given the C++ committee wants to standardize these hints I will push to standardize these hints as part of the GrB_get/set addition.

DrTimothyAldenDavis commented 2 years ago

For your C++ hints, would you assume the "dense" hint would be like my GxB_SPARSE + GxB_FULL, or perhaps GxB_BITMAP + GxB_FULL? In other words the hint is "I suggest that you store this as a dense vector, but of course if some entries are not present, you can ignore this hint and store in some manner that preserves its sparsity structure". Right?

BenBrock commented 2 years ago

Correct. "Dense," like all the other hints, is just a hint that using dense storage might be more efficient. The implementation would still be required to keep track of which indices contain stored values and which don't. That would presumably require either a bitmap, list of indices, etc., alongside dense storage. And, of course, the implementation can use any sparse matrix format it desires, since it's just a hint.

GraphBLAS / graphblas-api-c