GraphBLAS / graphblas-api-c

Other
7 stars 3 forks source link

Extended semantics for Matrix_Dup and Vector_Dup #71

Open manoj63 opened 1 year ago

manoj63 commented 1 year ago

Premise

Proposal: Main idea

This proposal targets GraphBLAS C-API 2.1. The main idea is to extend the signature of the Matrix_Dup and Vector_Dup instructions to include an additional and optional parameter (hint), the data-structure recommended by the application programmer. The main idea is to extend the signature of the Matrix_Dup and Vector_Dup instructions to include an additional and optional parameter, the data-structure recommended by the application programmer.

While the above extension is a succinct API call to change the data-structure underlying a GraphBLAS matrix or vector, there are additional or alternative approaches to achieve the same effect, namely

Problem Solved

  1. Choice of data-structure for matrices and vectors optimized for the operation being performed on them, and for the sparsity and distribution of non-zero elements, the choice being dictated by application programmer input

Details to be worked out

DrTimothyAldenDavis commented 1 year ago

Lots of details to work out of course, but this is looking great. I think having several options would be best: adding the hint in matrix and vector dup is a great idea, since it allows the user application to make a copy of a matrix / vector but in a different (hint hint) format. I don't have this feature myself since I was trying to implement my hints with as little change to the API as possible, but this is an important feature to consider.

It would also be important to allow for a hint to be given for an existing matrix, like my current GxB_set methods.

I think the idea of the orthogonal channel for these hints would be a good idea, as best we can. Matrix / vector dup would need it in the API, perhaps in the descriptor (which could be added to the Matrix_dup and Vector_dup signatures).

I think adding hints like this to the GrB_Descriptor is great way to add these features.

mcmillan03 commented 1 year ago

The C++ API is also working on hints for all constructors and we are considering a number of orthogonal "dimensions" to these hints:

These hints won't directly dictate the data structure.

rayegun commented 1 year ago

We should meet and discuss this between both committees then? Would be nice to converge on something at least close. We're having a very similar discussion, although we've been very wary of standardizing the hints. Maybe we're being too conservative, especially if they're just hints.

BenBrock commented 1 year ago

Here are the hints we've defined thus far in the C++ API. I'd be happy to chat about defining some set so we can have a rough correspondence between the C and C++.

These C++ hints are compile-time hints, but conceptually they're exactly the same thing as your hints (just your hints could be chosen based on runtime information).

BenBrock commented 1 year ago

Also, to provide some context around these hints, the idea is that they should be composed to indicate different sparse data structures.

grb::compose<grb::sparse, grb::row> -> potentially CSR grb::compose<grb::sparse, grb::column> -> potentially CSC grb::compose<grb::dense, grb::row> -> potentially dense row-major

You could do something similar in C, either by OR'ing them together GrB_SPARSE | GrB_ROW or by having your own compose. (Combinations of hints are more limited with OR'ing, at least in C.)

I am a big fan of standardizing hints. With non-standard hints, code becomes non-portable. If I write my code using SuiteSparse GraphBLAS and a bunch of SuiteSparse's non-standard hints, my code will no longer compile and run with other GraphBLAS implementations. My code is now ill-formed according to the GraphBLAS standard.

However, if I have standardized hints, code is portable between implementations. Hints in the C++ API do not cause any semantic change in the matrix or vector data structures. This makes them effectively optional for implementers. A conformant implementation need only compile and run with all the standard hints, as the hints do not cause any semantic change in the data structure. Good implementations should, however, document what the effect on performance might be of various hints.

DrTimothyAldenDavis commented 1 year ago

I have a simple composition of my hints: I can sum them. SPARSE + BITMAP + FULL means that the matrix can be either sparse, bitmap, or full, but not hypersparse. HYPERSPARSE + BITMAP means it can be hypersparse or bitmap (my choice inside GraphBLAS) but I am being told "hint: not sparse and not full".

It's just a hint of course. If I'm told "only FULL", then I treat that as "OK, how about make it BITMAP + FULL" because if an entry is missing, I can't populate with some value to force it to be full.

My hints cause no semantic change to the matrix or vector at all. This is very important. The hints are completely optional, and if I ignore them, or if they are not given to me at all, the results are the same. Just the internal data structure might differ, and the performance differs.

The same thing holds for CSR vs CSC. My default is CSR, but the user application can ignore that. Semantics are the same regardless of how the matrix or vector is stored: by row or by column. ONly difference is that I never change this myself, automatically, because it would be a very tricky heuristic.

rayegun commented 1 year ago

I think given the C++ committee wants to standardize these hints I will push to standardize these hints as part of the GrB_get/set addition.

DrTimothyAldenDavis commented 1 year ago

For your C++ hints, would you assume the "dense" hint would be like my GxB_SPARSE + GxB_FULL, or perhaps GxB_BITMAP + GxB_FULL? In other words the hint is "I suggest that you store this as a dense vector, but of course if some entries are not present, you can ignore this hint and store in some manner that preserves its sparsity structure". Right?

BenBrock commented 1 year ago

Correct. "Dense," like all the other hints, is just a hint that using dense storage might be more efficient. The implementation would still be required to keep track of which indices contain stored values and which don't. That would presumably require either a bitmap, list of indices, etc., alongside dense storage. And, of course, the implementation can use any sparse matrix format it desires, since it's just a hint.