Open mcmillan03 opened 3 years ago
So far every time I have used eWiseAdd with non-commutative operators I have found workarounds such that this enhancement is not required:
Examples:
One needs to weigh increased API against terser/clear code, and we need to find examples for which there are no good workarounds.
I am nervous about adding the bmissing and cmissing extension unless we have real applications where this is needed. I like sticking to the mental model of ewise_add as a set union operation and the *missing proposal breaks that.
We do have applications that need it: the Python & Julia developers in particular, would like this feature. My MATLAB implementation of C=A-B is slow compared to the built-in C=A-B, precisely because of this issue.
My MATLAB interface would use this for the MATLAB expressions A < B, A > B, min(A,B), max(A,B), A != B, A.^B, A-B, atan2(A,B), bitset(A,B), bitshift(A,B), complex(A,B), and hypot(A,B). All those are slower because of the lack of this feature.
See https://github.com/DrTimothyAldenDavis/GraphBLAS/blob/stable/GraphBLAS/%40GrB/private/gb_union_op.m for details.
I should probably implement this as a GxB_Matrix_eWiseUnion operation first, and see what the difference in performance is, before we consider it for the spec.
This keeps the set union feature. EwiseAdd does the following:
if A(i,j) exists but B(i,j) does not: C(i,j) = A(i,j)
else if A(i,j) does not exist but B(i,j) does: C(i,j) = B(i,j)
else both exist: C(i,j) = op (A(i,j), B(i,j))
The suggested eWiseUnion would do this instead. The pattern of the result of eWiseAdd and eWiseUnion would be the same. Just the values differ:
if A(i,j) exists but B(i,j) does not: C(i,j) = op (A(i,j), bscalar)
else if A(i,j) does not exist but B(i,j) does: C(i,j) = op (ascalar, B(i,j))
else both exist: C(i,j) = op (A(i,j), B(i,j))
I've drafted a GxB_Matrix_eWiseUnion (C, Mask, accum, binaryop, A, Amissing, B, Bmissing, descriptor)
where Amissing and Bmissing are GrB_scalars, and also GxB_Vector_eWiseUnion. Still in testing stages but it's passing my initial tests. The code is almost identical to GrB_eWiseAdd, except for a single point in the innermost loops, so I can use the same code base for both eWiseAdd and GxB*eWiseUnion. Here's the draft signature; if anyone has any feedback please let me know:
It's too much to add variants using C scalars; there would be 13x13 = 169 of them, since there are 2 scalar inputs, and then multiply that again by 3 if you want to pass in a Monoid, or a Semiring. That's way too much. I think it's simplest just to have 2 functions, as I have in my draft GraphBLAS.h, linked above.
I'll add this to my v5.2.0.alpha and v6.0.0.alpha prereleases, once it's fully tested and documented.
For the workarounds, with C=A-B, it's a lot slower to use GrB-apply followed by GrB_eWiseAdd. For C = A < B, the workaround you suggest above, Scott, doesn't work if A and/or B has negative entries. In that case, the entries in A but not B can be either true or false, (true if < 0 and false if >= 0).
I do have workaround for all of these, but they require a lot of extra work. I have to create a pair of temporary matrices A2 and B2 using assign, where A2 and B2 both have the pattern of the set union of A and B, and where A2 has been padded with the "Amissing" scalar and B2 is padded with "Bmissing". This works but it's a lot slower.
Python, Julia, and MATLAB all need this GxB_*_eWiseUnion, and they all have to use klugey workarounds without it. I'm open to another suggestion for the name.
There's another limitation of GrB_eWiseAdd that this method fixes.
Say you have two matrices A and B of a user-defined type, and a user defined operator to compute "<", and you want to compute C = A<B using a set union operation. THis cannot be done with GrB_eWiseAdd. It's impossible because entries in A but not B are typecasted directly into C, bypassing the operator. The typecast from user-defined types to boolean is not permitted.
GrB_eWiseAdd is the only method using an operator, monoid, or semiring that allows entries to bypass the op / monoid / semiring and be typecasted directly into the output. Methods that do not use an op / monoid / semiring are not a problem (also GrB_select) since they typically do not need to typecast their input into the output matrix C anyway. For those methods (like GrB_transpose, GrBselect, GrB*_dup), the output is a subset or rearrangment (GrB_transpose) of their input, so it's natural to not do any typecasting.
eWiseMult does not have this problem since all entries that go to the output C (or the internal matrix T) all go through the operator. So with C boolean, A and B UDT, the computation C = A < B in a set intersection manner can be done just fine in GraphBLAS.
However, with C boolean, A and B UDT, the computation C = A < B simply cannot be done in a set union manner in GraphBLAS. It is impossible.
GxB_Matrix_eWiseUnion (C, Mask, accum, op, A, Amissing, B, Bmissing, descriptor)
plugs this hole in the API.
I've added GxB_Matrix_eWiseUnion and GxB_Vector_eWiseUnion to my v5.2.0.alpha13 ( https://github.com/DrTimothyAldenDavis/GraphBLAS/releases/tag/v5.2.0.alpha13 ) and v6.0.0.alpha13 ( https://github.com/DrTimothyAldenDavis/GraphBLAS/releases/tag/v6.0.0.alpha13 ) pre-releases, fully tested and documented.
Regarding naming (i.e., my $0.02)... After much thought over the years, it is unfortunate that the proposed extension is more aptly named eWiseAdd; where as the existing functionality is more aptly named eWiseUnion. The C++ API has an opportunity to go this route (not likely at the moment), but the conversation so far has been to consider function overloads -- with and without xmissing args.
Whatever is decided really needs to appear in the math document. I will try to take a first pass at adding the ewise section to the rough draft
Extend eWiseAdd (call it another name; do not change eWiseAdd) so it can take extra scalars that define how the operator works on the set difference. (when $a{ij}$ is present but not $b{ij}$, and visa versa).
Right now, eWise add does something like:
What about a GxB_eWiseAdd_whatever that took two additional inputs: a GrB_Matrix amissing and GrB_Matrix bmissing, that are 1-by-1 matrices. If NULL, or if they have no entries, then they act like eWiseAdd above.
For the functions like GrBeWiseAdd*_Monoid, the missing value amissing = bmissing = Monoid->identity. Likewise for the Semiring variants.
So this would only be available for as a variant of GrBeWiseAdd*_BinaryOp.
if the operator was "less than", and amissing = -infinity, then "nothing in aij" would mean "bij is bigger", so the result would be true. If bmissing = +infinity, then "nothing in bij" would mean "aij" is smaller" so the result is false.
For the "minus" operator, if the matlab expression C=A-B is desired, the missing values are zero in both case.
It's tempting to pass in an actual scalar instead of 1-by-1 matrices. But scalar args are really nasty (witness all the GrB_method_TYPE functions ... there could now be two types, one for each scalar).