GraphBLAS / graphblas-api-c

Other
7 stars 3 forks source link

BB-23: User-defined monoids with terminal values #11

Open mcmillan03 opened 3 years ago

mcmillan03 commented 3 years ago

See Suitesparse's GxB_Monoid_terminal_new to create a monoid, and GxB_Monoid_terminal (return the monoid terminal value (if any)).

In the dot product formulation of the matrix multiply, a dot product can test if the value is such that early termination is possible. For example, for the logical or, if the value is true, then no further work is needed. This is very important for BFS and the boolean monoids. I can do this without any change to the spec or the results of my codes ... they would just be lots faster.

Early termination can also be exploited in GrB_reduce. The gain in performance can be extreme (I haven't done it yet but I'm working on it).

Almost half the monoids for built-in operators can terminate early: OR (when reaching what I call a "terminal value" of 'true') AND (terminal value is false) MIN on any integer type: the smallest integer (INT8_MIN for int8, for example) MAX on any integer: the terminal value is the largest integer

Once the "terminal value" is reached, no further operations are needed.

The PLUS and TIMES monoids cannot terminate (I exclude boolean from those, since TIMES_BOOL is the same as AND_BOOL anyway). This is because even integers need to overflow and wrap around. Also, the boolean EQ and XOR monoids can't terminate (and the NE_BOOL being identical to XOR_BOOL). Well, maybe GrB_PLUS_FP32 could terminate if it reaches +INFINITY, but then INFINITY + NaN is supposed to be NaN, according to the IEEE standard.

I'm [Tim Davis is] suggesting (the 'annihilator'). But the term 'terminal value' might be a better name because of your idea that GrB_MIN_FP32 could halt if it sees a zero, because you know all your values are positive. Zero is technically not the annihilator but it could be (non-default) stated as the 'terminal value', say in a user-defined Monoid.

Given a function z=f(x,y), the value a (annihilator or 'terminal value') is that value such that f(x,a) = f(a,x) = a for any x.

Your list is correct, except not for GrBPLUS and GrBTIMES (except BOOL). For floating-point, Nan behaviour is important to consider because of the IEEE 754 standard. For min and max, with the 'omitnan' behaviour (the default in MATLAB), min(x,nan)=x, and so nans are ignored. This means and NaNs not considered can also be ignored. Thus GrB_MIN_FPx has a 'terminal value' of -infinity. If min and max have the 'includenan' behavior, then min(x,nan)=nan and the terminal value is NaN ... which is not really that useful for performance (NaNs would be uncommon).

Good point though about giving the user the ability to redefine the 'terminal value', like GrB_MIN_FP32 halting if it finds a zero, if you know your values are all positive. I think a built-in GrB_MIN_FP32_MONOID should use -Inf as the terminal value.

But for PLUS and TIMES, x+NaN should always be NaN. That's why I didn't include GrB_PLUS_FPx as having a terminal value. NaNs are rare and it's not worth testing to see if you have one to exit early from a monoid reduction.

For GrB_PLUS_INTx, I assumed that the behavior of integer arithmetic is modular, like it is in standard C. So for uint8_t x = 255, if you compute x=x+1, it becomes zero. As a result, there is no terminal value for GrB_PLUS_INTx or unsigned GrB_PLUS_UINTx. Someone writing in C would be surprised to see that 255+1 gives 255. It does in MATLAB but the GraphBLAS API would be read as 255+1 = 0 for uint8_t, I think. The GrB_PLUS_BOOL operator is to GrB_LOR ... which is what a C compiler does (true + true is true).

tgmattso commented 3 years ago

This makes sense to me.

manoj63 commented 1 year ago

The value of Monoid_terminal is in applying a function iteratively until the terminal value is reached. Two points: 1) We can be more general - we should define the fixed point function which takes a function f, and applies f repeatedly until a specified value is reached. 2) Takes a function f, and applies f repeatedly until a fixed point is reached, i.e., the value is converged.

:-) I am borrowing this idea from the GPI specification and will be happy to write it in an issue.