GraphBLAS / graphblas-api-c

Other
7 stars 3 forks source link

BB-37: GrB_Any Monoid and/or GrB_Either BinaryOp #21

Open mcmillan03 opened 3 years ago

mcmillan03 commented 3 years ago

Tim Davis wrote:

Many algorithms use a MIN-SECOND or MIN-FIRST semiring, to find the parent in the BFS tree, for example. The MIN is not always strictly required. A faster monoid would be based on an ANY operator, which does $z=f(x,y)$, returning $x$ or $y$, arbitrarily. This function can be turned into a monoid (the identity is implicit), and it can always terminate early, on the first value it finds.

This is needed if GraphBLAS is to be asymptotically efficient in a push/pull BFS that returns a tree (for the GAP benchmark for example).

Do we also need to define GrB_ANY binary op?

The ANY monoid has a curious feature, however. It has "no" identity value, or "any" identity value, depending on how you look at it. That is, f(,x) = f(x,) = x for any value "*".

This is no problem for a semiring when doing C=AB. It turns out that the identity value of a monoid is almost never needed. I use it in Gustavson's method for C=AB, but only for convenience; I could actually ignore the identity value there. I don't use the identity value for C=A*B when using dot products. The only place where the identity value of a monoid is actually required is GrB_reduce to scalar, when the matrix or vector has no entries at all:

GrB_reduce (&scalar, ..., GrB_ANY, A , ...)

would return any one entry from A, as it likes. But what can it do if the matrix has no entries? What is the return value of the scalar? Any value? zero? No change to the scalar?

This is yet another case as to why GraphBLAS really needs a GrB_Scalar object. GrB_reduce to scalar destroys non-blocking mode because of the ugly user scalar, and it also causes a problem for the ANY monoid, which would like to return a 'nothing'. If GrB_reduce to scalar returned a GrB_Scalar object (which is like a 1-by-1 sparse matrix) then it could be an empty scalar, with no value, like "scalar = sparse(0)" in MATLAB.

ANY is commutative in the way we define it: it returns the correct output regardless of the order of operands. ANY(x,y)=ANY(y,x)= return any of x or y.

I think NOTHING or NO_VALUE is the right identity in this case:

f(NOTHING, x) = x, f(x, NOTHING) = x, f(NOTHING, NOTHING) = NOTHING


From: Tim Davis [mailto:davis@tamu.edu] Sent: Tuesday, July 23, 2019 9:54 AM To: Tim Davis davis@tamu.edu Cc: Aydin Buluc abuluc@lbl.gov; Carl Yang ctcyang@ucdavis.edu; Gabb, Henry A henry.a.gabb@intel.com; John Gilbert gilbert@cs.ucsb.edu; Jose Moreira jmoreira@us.ibm.com; Kepner, Jeremy - 0553 - MITLL kepner@ll.mit.edu; Mattson, Timothy G timothy.g.mattson@intel.com; Scott McMillan smcmillan@sei.cmu.edu Subject: Re: operators of an unusual kind

I think "EITHER" would be good. I do see the problem with ANY, since f = ANY(3,4) could be misunderstood as "ignore the input arguments, just return any result in the domain: pi, zero, +infinity, I don't care".

So the multiply operator could be z = either(x,y), which is defined as "either(x,y) = either x or y, as determined by the function itself".

I think that's clear ... any thoughts on either?

The other problem with "any" is that Python, R, Octave, and MATLAB define it as the logical or of a boolean array. So if we used the word "ANY", then someone familiar with those languages could be confused.

Those languages don’t have an EITHER function so there should be no confusion

DrTimothyAldenDavis commented 3 years ago

The ANY operator and the corresponding ANY monoid are powerful, and the only way to get close to the performance of the GAP Benchmark for BFS. For example the GAP benchmark for BFS on the kron matrix is about 0.3 seconds on my desktop. My BFS with the ANY_SECONDI semiring takes about 0.5 seconds, so I'm already slower than Scott Beamer's benchmark, but getting quite close. If use the MIN_SECONDI semiring, the time increases to 1.5 seconds (and that is by using the GxB_SECONDI operator to return the parent index, as an "IndexBinaryOp" it might be called in the spec).

So just using the ANY monoid instead of the MIN monoid speeds up the BFS by a factor of 3. That is huge.

The vanilla BFS (without any GxB extensions) is even slower, at around 6 or 7 seconds.

The operator is simple to define for the spec, just z = f(x,y) = x, or y, the library gets to pick at its discretion (it may or may not be deterministic, solely at the discretion of the internal decisions of the library). Any library that needs the GrB_ANY_T as a binary operator can just replace it with GrB_FIRST_T or GrB_SECOND_T as they like (either is fine and meets the spec). Any library that needs to compute with the GrB_ANY_MONOID would need to add its logic, but the logic is simple when used inside a matrix-multiply algorithm or in a GrB_reduce.