GraphBLAS / graphblas-api-c

Other
7 stars 3 forks source link

PAIR operator: essential for best performance on many problems #46

Closed DrTimothyAldenDavis closed 2 years ago

DrTimothyAldenDavis commented 3 years ago

Please consider the GrB_PAIR_T as a GrB_BinaryOp that computes f(x,y)=1, for all built-in types T. GrB_PAIR_BOOL would return true. This operator is essential for implementing asymptotically fast semirings. For example a PLUS_PAIR semiring (even user-defined, if PAIR is built-in) takes O(k) time to compute y = A*x when x is a fully-populated vector (all entries present), and where A is Hypersparse CSR, with m by n with k nonempty rows with k << m being possible. I can't do that with a user-define PAIR operator since I can't tell that it always returns a 1. The time instead for a user-defined PAIR operator would be Omega(e) if e = nvals(A), and of course k << e is common.

The operator is also very powerful when computing "symbolic" results, where all the entiries present in a GrB_Matrix have the same value. Consider C =A*B with the MIN_PAIR_INT64 semiring. I can infer from the MIN monoid and the built-in PAIR operator that the matrix C will be constructed with values all equal to one. All I need to do is the symbolic computation for the pattern of C.

Unweighted graphs are common, and the PAIR operator allows for fast methods to compute them ("fast" as in fast in practice, and sometimes asymptotically fast as well).

To get this performance, I don't need to have built-in semirings like GrB_PLUS_PAIR_SEMIRING_INT64. The user can create that if they like, and it will be just as fast as my built-in semiring. I can detect if the user creates a semiring with the built-in PLUS and the built-in PAIR, for example.

I can do so much with the addition of the built-in GrB_PAIR_T operators, and it is trivial to define and include in the spec.

If you like, you can call it GrB_ONE_T and ignore my request to add f(x)=1 as a built-in unary operator. This operator is more important.

mcmillan03 commented 2 years ago

List any new predefined semirings that would include this new PAIR operator.

DrTimothyAldenDavis commented 2 years ago

Adding predefined semirings is not critical, and not required (for me) to get this performance from the PAIR operator. For example, a PLUS_PAIR_T semiring can be created by the user with:

GrB_Semiring_new (&semiring, GrB_PLUS_MONOID_INT64, GrB_PAIR_INT64) ;

and this would be identical to my predefined GxB_PLUS_PAIR_INT64 semiring.

There are 2 sets of semirings that would be useful, however, if you want to add them. The first set would be PLUS_PAIR_T for all built-in types T. The other set would use any "nice" monoid (MIN, MAX, LOR, LAND, BOR, BAND, and my ANY monoid). I use this word "nice" internally. A "nice" monoid has the property where if given a set of the same values to reduce, say {2, 2, 2, 2, 2}, the result is always 2. The min of this set is 2 for example.

The PAIR operator is powerful because it always create a set of 1s to be reduced by the monoid, as {1, 1, 1, ..., 1}. The PLUS monoid simply returns the size of this set. A "nice" monoid returns 1 when reducing this set to a scalar.

Monoids that are "nice with PAIR" include EQ and TIMES. A TIMES monoid when givent the set {1, 1, ..., 1} from the PAIR operator will return 1, as well. So, for the PAIR operator, for any built-in type T, the MIN_PAIR_T, MAX_PAIR_T, and TIMES_PAIR_T all return the same answer. I compute them all with my ANY_PAIR_T semiring for any built-in type T.

If you include the ANY monoid in the spec, then I would say add the ANY_PAIR_T semirings. But that's not essential; internally, I map all these "nice" monoid to my "ISO semiring", where the resulting matrix C has the property that all entries are the same. This "ISO" semiring just does the symbolic work and only needs O(1) numerical work to compute the iso value of C.

My "ISO semiring" is ANY_PAIR_BOOL, but LOR_PAIR_BOOL works fine too. MIN_PAIR_T for any built-in type T, or TIMES_PAIR_T would be good. All these map to a single "ISO semiring".

DrTimothyAldenDavis commented 2 years ago

I've added this to the V2 Project items. It is really essential for best performance, and truly trivial to add to the spec.

I don't need any semirings, just the binary GrB_PAIR_T operator for all built-in types T, which returns f(x,y)=1 for all inputs (the output has the same type as the inputs). Call it anything you like.

With this built-in operator I can create asympotically fast methods for all kinds of computations, including computing the row/column degree, and doing symbolic computations where only the pattern of the result is needed (all entries present become equal to 1). I can't do this quickly with a user-defined operator that does the same thing, since I have no idea that the user-defined operator is the simple f(x,y)=1. It has to be built-in to be fast.

DrTimothyAldenDavis commented 2 years ago

All I need is a single line added to Table 3.5:

operator type: GrB_BinaryOp GraphBLAS identifier: GrB_PAIR_T or if you like GrB_ONE_T Domains: TxT -> T Description: f(x,y) = 1 (or true for GrB_PAIR_BOOL) comment: one

Give me that, and I will give you GrB* semirings that are blazing fast. I will give you blazing fast iso-valued matrices that take very little memory and that are more powerful and easier to add to the spec that the Pattern-only matrices proposed in #25 (that issue should be closed; it is too disruptive to the spec and its effect can be obtained merely by adding this simple binary operator).

Give me GrB_PAIR (or name it what you like) and I can do all that with pure GrB* methods. Nothing else is needed to be added to the spec to get this performance and to get the same expressiveness of pattern-only matrices.

I cannot give you this blazing fast, low-memory performance with a user-defined operator of the form f(x,y)=1, since I cannot know that the function has that form.

DrTimothyAldenDavis commented 2 years ago

It’s just like any other binary op function in terms of the type casting rules. This function follows all the established rules. That is why this function only needs a single line added to the spec. It is not special in that regard. It is totally vanilla.

Here is an example of its use (see also the LAGraph_Property_RowDegree) function at https://github.com/GraphBLAS/LAGraph/blob/reorg/src/utility/LAGraph_Property_RowDegree.c

Suppose the matrix A is nrows-by-ncols, with e entries, and suppose A has k non-empty rows with k << nrows, and k << e. Also suppose A is held internally in CSR format. Then with just this GxB_PAIR_INT64 binary operator as an extension to the spec, I can currently compute the row degree in merely O(k) time and O(k) memory:

// x = all-zero vector of size ncols-by-1.  I do this in O(1) time and space in pure GrB*:
GrB_Vector_new (&x, GrB_INT64, ncols) ;
GrB_assign (x, NULL, NULL, 0, GrB_ALL, ncols, NULL) ;

// create the plus-pair-int64 semiring
GrB_Semiring_new (&plus_pair_int64, GrB_PLUS_MONOID_INT64, GxB_PAIR_INT64) ;

// compute the row degree, in only O(k) time with k << nrows, k << ncols, and k << e.
GrB_Vector_new (&rowdegree, GrB_INT64, nrows) ;
GrB_mxv (rowdegree, NULL, NULL, semiring, A, x, NULL) ;

The only thing in the above code that is GxB is the trivial GxB_PAIR_INT64 operator, which could be added to the spec in a single line in Table 3.5. By stark contrast, doing this in vanilla GrB with no GrB_PAIR_INT64 operator takes O(e) time and O(e) memory where k << e is common. The difference is vast.

Adding the GrB_PAIR_INT64 operator will allow for asymptotically fast computation of the row and column degree, and so many other things as well.

It just has a profound impact on GraphBLAS. The 3 most important binary ops in GraphBLAS are FIRST, SECOND, and this new one PAIR. These functions are even more important than PLUS and TIMES. This is because many many graph algorithms ignore edge weights.

This binary op f(x,y)=1 acts something like FIRST (x,y) = x, which is agnostic to the value of y but in GrB mxm in a semiring is only applied to as FIRST (aik, bkj) where both aik and bkj exist. The PAIR operator is only applied in a semiring as PAIR(aik,bkj) where both aik and bkj exist, just like any other op. It just happens to always return 1, ignoring the values of both its inputs.

That’s why I called it PAIR. Also because I already had a unary op GxB_ONE_T, but I’m not recommending that to be added to the spec. So f(x,y)=1 could be called GrB_ONE_T as a binary op, if you like.

mcmillan03 commented 2 years ago

We will be adding GrB_ONEB_T to the list of binary ops.

mcmillan03 commented 2 years ago

Added.