GraphBLAS / graphblas-api-c

Other
7 stars 3 forks source link

BB-30: eWiseAdd with extra scalars for when entries are missing (and the implied zero does not work). #16

Open mcmillan03 opened 3 years ago

mcmillan03 commented 3 years ago

Extend eWiseAdd (call it another name; do not change eWiseAdd) so it can take extra scalars that define how the operator works on the set difference. (when $a{ij}$ is present but not $b{ij}$, and visa versa).

Right now, eWise add does something like:

if aij and bij are both in the pattern
    cij = op (aij, bij)
else if aij is in the pattern
    cij = aij
else if bij is in the pattern
    cij = bij

What about a GxB_eWiseAdd_whatever that took two additional inputs: a GrB_Matrix amissing and GrB_Matrix bmissing, that are 1-by-1 matrices. If NULL, or if they have no entries, then they act like eWiseAdd above.

if aij and bij are both in the pattern
    cij = op (aij, bij)
else if aij is in the pattern
     if bmissing is given
          cij = op (aij, bmissing)
     else
          cij = aij.
else if bij is in the pattern
    if amissing is given
         cij = op (amissing, bij)
    else
         cij = bij

For the functions like GrBeWiseAdd*_Monoid, the missing value amissing = bmissing = Monoid->identity. Likewise for the Semiring variants.

So this would only be available for as a variant of GrBeWiseAdd*_BinaryOp.

if the operator was "less than", and amissing = -infinity, then "nothing in aij" would mean "bij is bigger", so the result would be true. If bmissing = +infinity, then "nothing in bij" would mean "aij" is smaller" so the result is false.

For the "minus" operator, if the matlab expression C=A-B is desired, the missing values are zero in both case.

It's tempting to pass in an actual scalar instead of 1-by-1 matrices. But scalar args are really nasty (witness all the GrB_method_TYPE functions ... there could now be two types, one for each scalar).

mcmillan03 commented 3 years ago

So far every time I have used eWiseAdd with non-commutative operators I have found workarounds such that this enhancement is not required:

Examples:

  1. with the MINUS operator if you apply AINV to the rhs and then perform ADD it accomplishes the same thing. (this is an extra operation).
  2. with "less than" operator, if you apply one of the operands as the mask to eWiseAdd you will get the correct result.

One needs to weigh increased API against terser/clear code, and we need to find examples for which there are no good workarounds.

tgmattso commented 3 years ago

I am nervous about adding the bmissing and cmissing extension unless we have real applications where this is needed. I like sticking to the mental model of ewise_add as a set union operation and the *missing proposal breaks that.

DrTimothyAldenDavis commented 3 years ago

We do have applications that need it: the Python & Julia developers in particular, would like this feature. My MATLAB implementation of C=A-B is slow compared to the built-in C=A-B, precisely because of this issue.

My MATLAB interface would use this for the MATLAB expressions A < B, A > B, min(A,B), max(A,B), A != B, A.^B, A-B, atan2(A,B), bitset(A,B), bitshift(A,B), complex(A,B), and hypot(A,B). All those are slower because of the lack of this feature.

See https://github.com/DrTimothyAldenDavis/GraphBLAS/blob/stable/GraphBLAS/%40GrB/private/gb_union_op.m for details.

I should probably implement this as a GxB_Matrix_eWiseUnion operation first, and see what the difference in performance is, before we consider it for the spec.

DrTimothyAldenDavis commented 2 years ago

This keeps the set union feature. EwiseAdd does the following:

if A(i,j) exists but B(i,j) does not:  C(i,j) = A(i,j)
else if A(i,j) does not exist but B(i,j) does:  C(i,j) = B(i,j)
else both exist:  C(i,j) = op (A(i,j), B(i,j))

The suggested eWiseUnion would do this instead. The pattern of the result of eWiseAdd and eWiseUnion would be the same. Just the values differ:

if A(i,j) exists but B(i,j) does not:  C(i,j) = op (A(i,j), bscalar)
else if A(i,j) does not exist but B(i,j) does:  C(i,j) = op (ascalar, B(i,j))
else both exist:  C(i,j) = op (A(i,j), B(i,j))
DrTimothyAldenDavis commented 2 years ago

I've drafted a GxB_Matrix_eWiseUnion (C, Mask, accum, binaryop, A, Amissing, B, Bmissing, descriptor) where Amissing and Bmissing are GrB_scalars, and also GxB_Vector_eWiseUnion. Still in testing stages but it's passing my initial tests. The code is almost identical to GrB_eWiseAdd, except for a single point in the innermost loops, so I can use the same code base for both eWiseAdd and GxB*eWiseUnion. Here's the draft signature; if anyone has any feedback please let me know:

https://github.com/DrTimothyAldenDavis/GraphBLAS/blob/e33bf54dcd342f216808db00861c410d0fd93ca5/Include/GraphBLAS.h#L5190

It's too much to add variants using C scalars; there would be 13x13 = 169 of them, since there are 2 scalar inputs, and then multiply that again by 3 if you want to pass in a Monoid, or a Semiring. That's way too much. I think it's simplest just to have 2 functions, as I have in my draft GraphBLAS.h, linked above.

I'll add this to my v5.2.0.alpha and v6.0.0.alpha prereleases, once it's fully tested and documented.

For the workarounds, with C=A-B, it's a lot slower to use GrB-apply followed by GrB_eWiseAdd. For C = A < B, the workaround you suggest above, Scott, doesn't work if A and/or B has negative entries. In that case, the entries in A but not B can be either true or false, (true if < 0 and false if >= 0).

I do have workaround for all of these, but they require a lot of extra work. I have to create a pair of temporary matrices A2 and B2 using assign, where A2 and B2 both have the pattern of the set union of A and B, and where A2 has been padded with the "Amissing" scalar and B2 is padded with "Bmissing". This works but it's a lot slower.

Python, Julia, and MATLAB all need this GxB_*_eWiseUnion, and they all have to use klugey workarounds without it. I'm open to another suggestion for the name.

DrTimothyAldenDavis commented 2 years ago

There's another limitation of GrB_eWiseAdd that this method fixes.

Say you have two matrices A and B of a user-defined type, and a user defined operator to compute "<", and you want to compute C = A<B using a set union operation. THis cannot be done with GrB_eWiseAdd. It's impossible because entries in A but not B are typecasted directly into C, bypassing the operator. The typecast from user-defined types to boolean is not permitted.

GrB_eWiseAdd is the only method using an operator, monoid, or semiring that allows entries to bypass the op / monoid / semiring and be typecasted directly into the output. Methods that do not use an op / monoid / semiring are not a problem (also GrB_select) since they typically do not need to typecast their input into the output matrix C anyway. For those methods (like GrB_transpose, GrBselect, GrB*_dup), the output is a subset or rearrangment (GrB_transpose) of their input, so it's natural to not do any typecasting.

eWiseMult does not have this problem since all entries that go to the output C (or the internal matrix T) all go through the operator. So with C boolean, A and B UDT, the computation C = A < B in a set intersection manner can be done just fine in GraphBLAS.

However, with C boolean, A and B UDT, the computation C = A < B simply cannot be done in a set union manner in GraphBLAS. It is impossible.

GxB_Matrix_eWiseUnion (C, Mask, accum, op, A, Amissing, B, Bmissing, descriptor)plugs this hole in the API.

DrTimothyAldenDavis commented 2 years ago

I've added GxB_Matrix_eWiseUnion and GxB_Vector_eWiseUnion to my v5.2.0.alpha13 ( https://github.com/DrTimothyAldenDavis/GraphBLAS/releases/tag/v5.2.0.alpha13 ) and v6.0.0.alpha13 ( https://github.com/DrTimothyAldenDavis/GraphBLAS/releases/tag/v6.0.0.alpha13 ) pre-releases, fully tested and documented.

mcmillan03 commented 1 year ago

Regarding naming (i.e., my $0.02)... After much thought over the years, it is unfortunate that the proposed extension is more aptly named eWiseAdd; where as the existing functionality is more aptly named eWiseUnion. The C++ API has an opportunity to go this route (not likely at the moment), but the conversation so far has been to consider function overloads -- with and without xmissing args.

Whatever is decided really needs to appear in the math document. I will try to take a first pass at adding the ewise section to the rough draft