TRIL, TRIU, DIAG, OFFDIAG, ROWLE, ROWGT, COLLE, and COLGT: only need _INT64 versions

DrTimothyAldenDavis commented 3 years ago

The proposed TRIL, TRIU, DIAG, and OFFDIAG functions come in 4 flavors: UINT32, UINT64, INT32, and INT64. These suffixes only affect the type of the "thunk" scalar input. The thunk is used only in index computations and the return value of these functions is bool, regardless of the suffix.

As a result, only the _INT64 versions are needed. The _UINT32, _UINT64, and _INT32 versions are either problematic (the unsigned integers cannot be reliably used in index computations) or superfluous (an int32 thunk is simply typecasted to an int64 anyway).

DrTimothyAldenDavis commented 3 years ago

Likewise (see other issue), the COLLE and related functions should NOT be defined for unsigned "thunk" values. All index computations should be done in int64. If these suggestions are accepted, then only the _INT64 suffix remains for TRIL, TRIU, DIAG, OFFDIAG, ROWLE, ROWGT, COLLE, and COLGT.

As a result, I would recommend simpler names: GrB_TRIL, GrB_TRIU, GrB_DIAG, GrB_OFFDIAG, GrB_ROWLE, GrB_ROWGT, GrB_COLLE, and GrB_COLGT, with no suffix. This is similar to the GrB_LOR binary operator, which has no suffix and whose types are all implicit.

mcmillan03 commented 3 years ago

I am confused by the comment "Likewise (see other issue), the COLLE and related functions should be defined for unsigned "thunk" values."

One can reference which "other issue" using hash followed by the issue number.

Did you really mean to say the thunk should be unsigned (exactly which types?) and that the computation should be done in signed? If the thunk is defined for more than one type then I cannot simplify the names of {COL,ROW}{LE,GT} operators. If you meant that the scalar (thunk) should be INT64 and computation should be carried out in INT64 then I have to figure out how to indicate that in the spec because the indices are UINT64.

DrTimothyAldenDavis commented 3 years ago

Oops. Sorry for the confusion. I left out the work "NOT". I edited my comment to fix the typo.

I mean to say that no IndexUnaryOp operator should do any index computation in unsigned arithmetic, and all of the thunk inputs for operators that depend on i and j should be signed integers.

I'm referring to #33 where I state that index computations in unsigned integers is problematic.

DrTimothyAldenDavis commented 3 years ago

Indices are GrB_Index in the spec, but the i and j values passed to the GrB_IndexUnaryOp could be typecasted first, to int64_t. This limits the maximum dimension of any matrix to be no larger than INT64_MAX, instead of UINT64_MAX, as a side effect. But it's the only way to have sane index computations in the GrB_IndexUnaryOp.

mcmillan03 commented 3 years ago

Should the thunk in ROWINDEX, COLINDEX, DIAGINDEX also be fixed to int64, but keep the output type user specified? Or should these also be just one type and then require casting to whatever the type of the output object is?

DrTimothyAldenDavis commented 3 years ago

I think the output type could be INT64 or INT32. There are cases where returning INT32 as the data type of the output matrix is a good idea. I've written both GrB_ROWINDEX_INT64 and GrB_ROWINDEX_INT32. I have similar int64 and int32 positional unary operators in GxB.

The value of the thunk would naturally be int32 for GrB_ROWINDEX_INT32, but I first cast the thunk to int64, do the index computations in int64, and then cast the result to int32.

DrTimothyAldenDavis commented 3 years ago

But I'm also OK if you want to keep with just the int64 version of GrB_ROWINDEX, COLINDEX, and DIAGINDEX.

DrTimothyAldenDavis commented 3 years ago

There are performance advantages for having just GrB_ROWINDEXINT32 and INT64 (also for COLINDEX and DIAGINDEX), as opposed to just INT64. INT32 as a data type is faster than INT64, and I use this to improve the GAP benchmarks. So I would prefer GrB[ROW/COL/DIAG]INDEX_INT[32 and 64].

The other ops (DIAG, OFFDIAG, TRIL, TRIU, ROWLE, ROWGT, COLLE, and COLGT) only need INT64 variants. If that is the only type, then the suffix can be dropped, as well, just like GrB_LOR has no BOOL suffix.

mcmillan03 commented 2 years ago

The suggestions will be adopted and the spec updated to reflect.

mcmillan03 commented 2 years ago

The suggests will be adopted mostly: pending experiments by Jose, indices are still GrB_Index and operations will be performed per the casting rules of the C language.

DrTimothyAldenDavis commented 2 years ago

Passing the indices as GrB_Index is fine; the typical usage will often them to int64 right away but that would not cause any kind of slowdown at all, I think.

mcmillan03 commented 2 years ago

From Jose's research:

This is from our favorite C document (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf, page 53):

"Otherwise, if the operand that has unsigned integer type has rank greater or equal to the rank of the type of the other operand, then the operand with the signed integer type is converted to the type of the operand with unsigned integer type."

In other words, if an uint64_t is operated with either an int64_t or an int32_t, then the signed integer is converted to unsigned. I also verified that the result is as expected:

int32_t foobar(uint64_t i, int32_t s)
{
    return i+s;
}

int32_t x = foobar((uint64_t)(5000), (int32_t)(-1));

produces

x = 4999

which I think is the only sensible result.

DrTimothyAldenDavis commented 2 years ago

This is still a problem. The ANSI C11 spec (v1570) states in section 6.3.1.3, on page 51, regarding the typecast of an unsigned integer (uint64 in this case) to signed int32_t (the return value of the function):

Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.

The example doesn't trigger this case, but the following modification does:

#include <stdint.h>
#include <stdio.h>

int32_t foobar(uint64_t i, int32_t s)
{
    printf ("i %lu s %d i+s %lu\n", i, s, i+s) ;
    return i+s;
}

int main (void)
{
    int32_t x = foobar((uint64_t)(5000), (int32_t)(-1));
    printf ("x %d\n", x) ;

    int32_t y = foobar((uint64_t)(5000), (int32_t)(-5001));
    printf ("y %d\n", y) ;
}

The output of the program is:

i 5000 s -1 i+s 4999
x 4999
i 5000 s -5001 i+s 18446744073709551615
y -1

The final value of y is "correct" but it is an implementation-dependent result. The unsigned 64-bit integer value of i+s is not in the range of the int32 result. It's even larger than the INT64_MAX. The uint64_t value of 18446744073709551615 is outside the range of the signed integer result, so the sentence in the ANSI C11 spec applies: the result is implementation dependent.

One solution to this problem is allow the indices to the function to passed in as GrB_Index types, or uint64_t, but then state that the function immediately typecasts them to int64_t, and then all integer index computations are then done in int64_t. This implies that no matrix can have a dimension greater than INT64_MAX, or about 2 to the 63. That's fine with me, but the spec should state that. I actually enforce the largest matrix dimension to be GxB_INDEX_MAX or less, which I define as 2 to the 60. See issue #34.

DrTimothyAldenDavis commented 2 years ago

In my definitions of these functions, I use this: https://github.com/DrTimothyAldenDavis/GraphBLAS/blob/964e00baee52afdf4d596fc0d72ffd2cb00ac3dc/Source/Template/GB_ops_template.h#L1053

Removing the macros and simplifying, here is the function:

    inline void tril_function (bool *z, const void *unused, GrB_Index i, GrB_Index j, const int64_t *s)
    {
        (*z) = (((int64_t) j) <= (((int64_t) i) + (*s))) ;
    }

That is, I first typecast the GrB_Index inputs i and j to int64_t, then do the work in int64_t.

mcmillan03 commented 2 years ago

Reassign to Jose to discuss.

joseemoreira commented 2 years ago

Computer arithmetic, even with integral types, if full of surprises. Consider, for example, the following code fragment:

 48    if (1024UL < -1L) printf("1024 < -1\n");
 49    printf("INT64_MAX = %ld\n", INT64_MAX);
 50    printf("INT64_MAX + 1 = %ld\n", INT64_MAX+1L);

When I ran that on one of my machines, I get:

1024 < -1
INT64_MAX = 9223372036854775807
INT64_MAX + 1 = -9223372036854775808

The first two outputs should be universal, as they strictly follow deterministic rules of C. The third output is implementation specific, as adding the two unsigned longs caused an overflow (which in my machine just wrapped around). Neither the first nor the third result are intuitive. Note that the third result could happen in SuiteSparse when calling the tril_function above with i = 1, j = 0 and *s = INT64_MAX.

Therefore, I propose the following language for specifying the evaluation of functions in Table 3.6 (this would appear in the caption to the figure):

"The functions below that involve index arithmetic are to be evaluated in the real number field and the result converted to the output type. The behavior of the functions when the result cannot be represented in the output type, or when there are internal overflow/underflow situations during the evaluation, is implementation-defined."

The code fragment above for tril_function would be enough to fulfill the "implementation-defined" requirement, although the details of what happens (e.g., wrap-around, exception) could depend on the platform.

joseemoreira commented 2 years ago

I have added the following text to the caption of Table 3.6:

The expressions in the ``Description'' column are to be treated as mathematical specifications. That is, each one of $i$, $j$, and $s$ is interpreted as an integer number in the set $\mathbb{Z}$. Functions are evaluated using arithmetic in $\mathbb{Z}$, producing a result value that is also in $\mathbb{Z}$. The result value is converted to the output type according to the rules of the C language. In particular, if the value cannot be represented as a signed 32- or 64-bit integer type, the output is implementation defined. Any deviations from this ideal behavior, including limitations on the values of $i$, $j$, and $s$, or possible overflow and underflow conditions, must be defined by the implementation.

To give an example, let us take Tim Davis's function above:

inline void tril_function (bool *z, const void *unused, GrB_Index i, GrB_Index j, const int64_t *s)
{
    (*z) = (((int64_t) j) <= (((int64_t) i) + (*s))) ;
}

This function satisfies the definition, as long as either

The code for the function is shown (in which case we are letting the rules of C define the behavior, which could be platform-specific); or
There is a restriction imposed on the valid values of (*s) that guarantees the computation (i+s) will always produce a value representable as a 64-bit signed integer. (The range of i and j are already restricted in SuiteSparse:GraphBLAS.)

DrTimothyAldenDavis commented 2 years ago

This is perfect.

mcmillan03 commented 2 years ago

Need one adjustment to the text because not all 's' parameters are in "Z". s is I32/64 in the first three operators; it is I64 in the next 8 operators; but s will be the same type as the scalar element of the input container in the last 6 GrB_VALUE* comparator operators.

Should the table replace the use of italic "Z" so as not to confuse with \mathbb{Z} added with the text above.

joseemoreira commented 2 years ago

I wasn't even thinking of the value functions, because I don't think they are as problematic. But, as I told Scott, value functions are people too :-) I have fixed the language in the caption.

As for replacing the italic "Z", probably not a bad idea. Can I use "I" or is that going to be even more confusing? Any other suggestion?

mcmillan03 commented 2 years ago

One possibility: We already specifically reference I_U64 and I_64 so define I_32 in the caption and then use I_32/64 in place of $Z$ in the table.

joseemoreira commented 2 years ago

Done!

mcmillan03 commented 2 years ago

Closing

GraphBLAS / graphblas-api-c

TRIL, TRIU, DIAG, OFFDIAG, ROWLE, ROWGT, COLLE, and COLGT: only need _INT64 versions #45