GraphBLAS / graphblas-api-c

Other
7 stars 3 forks source link

Get/Set Proposal #73

Closed rayegun closed 7 months ago

rayegun commented 1 year ago

This is a very rough draft so far. General semantics are described briefly, but not enough.

jim22k commented 1 year ago

Let's require GrB_Scalar to be populated when setting. If an empty scalar is given, we should raise GrB_EMPTY_OBJECT.

DrTimothyAldenDavis commented 1 year ago

Overall it's a good start. To summarize, I think it has 2 serious flaws:

(1) nothing should return a GrB_Type *. Return a string instead. (2) GrB_Descriptor_set cannot be both a _Generic-based macro and an actual function as existing in the current C API.

DrTimothyAldenDavis commented 1 year ago

I can offer to write a draft implementation of these methods. I can write a draft GrB_get methods that return a GrB_Type *, even though I think that's not the right approach. It will work except that GrB_get won't work on the serialized blob (but you don't have a method for that).

The one thing I can't do is GrB_Descriptor_set, because the C API wants it to be both a _Generic macro and a plain C function. Making it a _Generic macro would break backward API compatibility so this can't be done as a v2.1 spec. You'll need to change the name of the _Generic-based macro to GrB_Desc_set or something.

Better yet, since the GrB_OBJECT_get methods are already _Generic macros, why not just use a single macro, GrB_get ? That macro would access all the non-polymorphic methods in Table 5.12. That way, the _Generic-based GrB_set will not conflict with the pre-existing GrB_Descriptor_set.

eriknw commented 1 year ago

Is it conceivable that a get/set operation could imply a computation? If so, should we add a descriptor to get/set to be able to control the computation in the spirit of #48?

DrTimothyAldenDavis commented 1 year ago

It's possible get/set involves computation, like a transpose of the data structure if the format of a matrix changes. But I don't think a descriptor should be added to guide that computation. In fact, I no longer believe it is necessary to add a descriptor to all methods. Add a GxB_Context perhaps (maybe!) but not a descriptor.

DrTimothyAldenDavis commented 1 year ago

Looks great! So to query a matrix for its type, you can ask for either an enum (where all UDT's are zero) or a string (where each UDT has a unique name). I think that's a good set of options.

Question about the ENUMs. Table 3.2 says that the ENUM value for GrB_UINT32 is 7, for example. But what is the name of the value 7? GrB_UINT32_CODE = 7 perhaps? I can't find the enum that lists the values 0 to 11. Can you add it to table 3.2?

My two extended built-in types, GxB_FC32 and GxB_FC64: can I give those enum values 12 and 13?

Question about the strings: if you ask a matrix with a user-defined type what its type is, via the enum, you always get a zero. If you ask for the type as a string, you get the GrB_NAME of the user-defined type. If the GrB_NAME has not been set, you get the empty string, ("" of length zero, not a NULL), right? That seems reasonable. The only alternative would be to require the GrB_NAME to be passed to GrB_Type_new, but that is disruptive to prior codes that use GrB_Type_new so I know that's not a good idea. It could only be done for a v3.0 spec.

If you ask a matrix for its GrB_ELTYPE_STRING, and the type is built in, what is the string? "GrB_BOOL" for a GrB_BOOL type? The Spec doesn't state what the string is.

Can you ask a serialized blob output from GrB_Matrix_serialize what its type is?

Finally ... would you like me to implement the set/get features in a draft of SuiteSparse:GraphBLAS before v2.1 is released as an official release?

DrTimothyAldenDavis commented 1 year ago

"Built-in algebraic objects and GrB_Types have names which can be read, but not written to."

What are the names of built-in objects (all built-in operators, monoids, and semirings)?

jim22k commented 1 year ago

Question about the ENUMs. Table 3.2 says that the ENUM value for GrB_UINT32 is 7, for example. But what is the name of the value 7? GrB_UINT32_CODE = 7 perhaps? I can't find the enum that lists the values 0 to 11. Can you add it to table 3.2?

I just added it as GrB_Type_Code in table 3.2.

My two extended built-in types, GxB_FC32 and GxB_FC64: can I give those enum values 12 and 13?

I added a note to start from 100 for custom types added by implementations. Maybe a bit overkill, but functional.

Question about the strings: if you ask a matrix with a user-defined type what its type is, via the enum, you always get a zero. If you ask for the type as a string, you get the GrB_NAME of the user-defined type. If the GrB_NAME has not been set, you get the empty string, ("" of length zero, not a NULL), right?

Correct.

That seems reasonable. The only alternative would be to require the GrB_NAME to be passed to GrB_Type_new, but that is disruptive to prior codes that use GrB_Type_new so I know that's not a good idea. It could only be done for a v3.0 spec.

We are planning on that behavior in v3.0, but don't want to make a breaking change in v2.1. Getting back an empty string is technically valid, but not practically useful if the user forgets to give their UDTs names.

If you ask a matrix for its GrB_ELTYPE_STRING, and the type is built in, what is the string? "GrB_BOOL" for a GrB_BOOL type? The Spec doesn't state what the string is.

I clarified this in the spec. You return "GrB_BOOL" for a GrB_BOOL. For operations, you would return "GrB_MIN_FP64" for the GrB_MIN_FP64 binary op. That keeps it simple and consistent between implementations. But hopefully most people will want to use codes rather than strings for builtins and only fall back to strings to UDTs. But the behavior is well defined in either case.

Can you ask a serialized blob output from GrB_Matrix_serialize what its type is?

We should discuss this. The plan is that a serialized blob would act like a collection for the purposes of get/set, meaning you can ask for the ELTYPE_CODE and ELTYPE_STRING, the NAME, and anything else you would ask from a real Matrix/Vector/Scalar. However, I'm not sure how to structure this from a practical level. With polymorphism, the call would be GrB_get(void *blob, GrB_Field field, int* code). The blob is too generic of a type to properly decide it should be handled by GrB_SerializeMatrix_get_ENUM or something like that.

However, looking at the current GrB_Matrix_serializeSize, GrB_Matrix_serialize, and GrB_Matrix_deserialize, it looks almost identical to the get/set proposal for void *. That makes me want to figure out a way to make "serialization" a GrB_Field which you can get or set (set might only work for an empty Matrix). That definitely feels strange, but there is too much overlap of utility to ignore.

Let me know if you have some thoughts on a good way to accomplish this within the get/set framework.

Finally ... would you like me to implement the set/get features in a draft of SuiteSparse:GraphBLAS before v2.1 is released as an official release?

I think that would be an excellent way to find anything we missed in the spec. Code and tests have a rigor that a spec doc lacks intrinsically.

DrTimothyAldenDavis commented 1 year ago

Sounds great. I'll think over the set/get for the serialized blob. It might be a bit tricky for a _Generic macro since it's a mere void *. I could draft some set/get methods that work with the serialize blobs. Probably "set" can't do anything to the blob (the only thing that should be done with a blob is to deserialize it or "get" its name, type, etc).

I will draft up a set/get implementation along the lines of this draft spec. What is the time line when you want to release v2.1? I would like to release a stable v8.0.0 of GraphBLAS sometime soon (in a month perhaps?), with the JIT. This will require the NAME to be set by GrB_set, and also the DEFN by GrB_set so I can JIT the user defined types and ops.

Does GrB_set/get allow for implementations to add their own set/get features? Like I would want to do

GrB_set (op, GxB_FUNCTION_DEFN, 
"void myadd (double *z, const double *y, const double *z) { (*z) = (*x)+(*y);}") ;

for example.

jim22k commented 1 year ago

What is the time line when you want to release v2.1?

I believe we are targeting HPEC.

Does GrB_set/get allow for implementations to add their own set/get features?

Yes. Implementations can define their own GrB_Fields. They need to be explicit about the associated type (Scalar, ENUM, String, void*), which objects they work for, whether they are writable and/or a hint, etc. Basically the same info as in Table 3.13. The spec doesn't explicitly state this, but probably best to start field enums at 500 or 1000 to have a safe buffer against the "official" field enums.

DrTimothyAldenDavis commented 1 year ago

Table 3.15 defines GrB_Storage_Sparsity but table 3.13 refers to GrB_Sparsity.

DrTimothyAldenDavis commented 1 year ago

The design of GrB_set is a problem. The void with the 4th size parameter is broken. I realize you want an extra parameter for GrB_set (object, field, void stuff, int size), if the "stuff" has type "void *", but the other methods do not have this parameter. You can't have both unless I use va_list instead of the macro __VA_ARGS__.

This is tricky to do with the _Generic macro itself (perhaps impossible). I can use __VA_ARGS__ tricks but that method does not handle zero args at all. The _Generic((x), ... ) mechanism must name the 3rd parameter, so it must be this (which will not work):

#define GrB_set(field,arg,...)                                  \
    _Generic                                                    \
    (                                                           \
        (arg),                                                  \
            GrB_Scalar       : GrB_Global_set_Scalar ,          \
            char *           : GrB_Global_set_String ,          \
            int              : GrB_Global_set_ENUM   ,          \
            void *           : GrB_Global_set_VOID              \
    )                                                           \
    (field,arg,__VA_ARGS__)

That does not work because in 3 of the 4 cases, __VA_ARGS__ is empty, but then those 3 methods have a trailing comma. It won't compile.

So I cannot do both GrB_set (object, field, int) and GrB_set (object, field, void *, int size) as a _Generic macro that uses __VA_ARGS__ in a macro. Instead, I would need to use the va_list, va_start, va_arg methods like I do now in my GxB_Global_Option_set method (for example). However, we've had problems with portability of va_list based methods.

C is not as flexible as C++ for this sort of thing so polymorphism has to be done carefully.

DrTimothyAldenDavis commented 1 year ago

The problem gets worse. Ignore the void size argument for now. Even with that assumption, I would still require va_list to make this work. I can't make it work with just _Generic macro magic.

The _Generic mechanism must have a way of testing the type of its arguments. All the arguments tested must be available to all methods. GrB_Scalar_set_Scalar has 3 parameters (o, field, t) where o and t are GrB_Scalars. GrB_Global_set_Scalar has 2 parameters: (field, t).

Selecting between them has to be done with _Generic (arg1), ... to test the type of arg1. But then inside that, the 2nd method has no 3rd argument to test. It's in the wrong place.

The only way I could get this to work with GxB_set was to use va_list. I can't make it work with a pure macro and __VA_ARGS__.

DrTimothyAldenDavis commented 1 year ago

Nice progress. One minor comment. To get the size of an object from GrB_get, is it int * or size_t *?

query_methods.tex, line 21:

The type of the argument is uniquely determined by {\sf field}.
For the case of {\sf char*} and {\sf void*}, the type can be replaced with {\sf size\_t*}
to get the required buffer size to hold the response.

objects.tex says int not size_t :

  1043  \subsubsection{String ({\sf char*}) Handling}
  1044  When the input to {\sf GrB\_set} is a {\sf char*} the input array is null terminated.
  1045  The {\sf GraphBLAS} implementation must copy this array into internal data structures.
  1046  Using {\sf GrB\_get} for strings requires two calls. First, call {\sf GrB\_get} with the
  1047  field and object, but pass {\sf int*} as the last argument. The implementation will return
  1048  the size of the string buffer that the user must create. Second, call {\sf GrB\_get} with
  1049  the field and object, this time passing a pointer to the newly created string buffer.
  1050  The {\sf GraphBLAS} implementation will write to this buffer, including a trailing null
  1051  terminator. The size returned in the first call will include enough bytes for the null terminator.

it should be size_t *, right?

jim22k commented 1 year ago

it should be size_t *, right?

We should start calling you hawkeyes. Yes, I missed that change.

DrTimothyAldenDavis commented 1 year ago

Regarding GrB_NAME, the draft states "The GrB_NAME field is a special case regarding writability. All objects which have a GrB_NAME field default to an empty string.".

Should add "except for GrB_GLOBAL, whose name is the name of the specific library implementation."

Also: is there a way to query a type for its size? I don't see it but it's important.

DrTimothyAldenDavis commented 1 year ago

What happens if you ask a GrB_UnaryOp for the type using GrB_INPUT2TYPE_CODE? That looks like it should be an error.

What happens if you ask a GrB_IndexUnaryOp like GrB_TRIL for its input1 type? Input1 is there for all GrB_IndexUnaryOps, but it would not make sense to ask for its type since it's not used by GrB_TRIL. That might not be an error, but at best something like GrB_NO_VALUE?

DrTimothyAldenDavis commented 1 year ago

The draft spec states: "When the input to GrB_set is a void*, an extra int argument is passed to indicate the size of the buffer."

Shouldn't that parameter be size_t, not int?

DrTimothyAldenDavis commented 1 year ago

Typo on line 683 of nonpolymorphic.tex:

683 {\sf GrB\_set(GrB\_Matrix,GrB\_Scalar),GrB\_Field} & {\sf GrB\_Matrix\_set\_Scalar (GrB\_Matrix,GrB\_Scalar),GrB\_Field} \\

There are two spurious right parentheses after the GrB_Scalar terms.

DrTimothyAldenDavis commented 1 year ago

Typo on line 588

588 {\sf GrB\_get(GrB\_Vector,int*,GrB\_Field)} & {\sf GrB\_Matrix\_get\_ENUM(GrB\_Vector,int*,GrB\_Field)} \\ The nonpolymorphic name should be GrB_Vector_get_ENUM.

Also on line 596.

DrTimothyAldenDavis commented 1 year ago

GrB_OUTP, GrB_MASK, GrB_INP0, and GrB_INP1 are defined as fields in both GrB_Descriptor_Field and in GrB_Field.

It gets flagged as an error to define GrB_OUTP twice (and also the others), in two different enum typedefs, even if they have the same value.

This doesn't compile:

#include <stdio.h>

typedef enum
{
    GrB_OUTP = 0,   // descriptor for output of a method
    GrB_MASK = 1,   // descriptor for the mask input of a method
    GrB_INP0 = 2,   // descriptor for the first input of a method
    GrB_INP1 = 3,   // descriptor for the second input of a method
}
GrB_Desc_Field ;

typedef enum
{
    GrB_OUTP = 0,   // descriptor for output of a method
    GrB_MASK = 1,   // descriptor for the mask input of a method
    GrB_INP0 = 2,   // descriptor for the first input of a method
    GrB_INP1 = 3,   // descriptor for the second input of a method
    GrB_NAME = 10,
    GrB_LIBRARY_VER_MAJOR = 11,     // SuiteSparse:GraphBLAS version
    GrB_LIBRARY_VER_MINOR = 12,
    GrB_LIBRARY_VER_PATCH = 13,
    GrB_API_VER_MAJOR = 14,         // C API version
    GrB_API_VER_MINOR = 15,
    GrB_API_VER_PATCH = 16,
    GrB_BLOCKING_MODE = 17,  
    GrB_STORAGE_ORIENTATION_HINT = 100,
    GrB_ELTYPE_CODE = 102,
    GrB_ELTYPE_STRING = 106,
    GrB_INPUT1TYPE_CODE = 103,
    GrB_INPUT2TYPE_CODE = 104,
    GrB_OUTPUTTYPE_CODE = 105,
    GrB_INPUT1TYPE_STRING = 107,
    GrB_INPUT2TYPE_STRING = 108,
    GrB_OUTPUTTYPE_STRING = 109,
}
GrB_Field ;

int main (void)
{
    printf ("this doesn't compile\n") ;
}

But this works instead:

#include <stdio.h>

typedef enum
{
    GrB_OUTP = 0,   // descriptor for output of a method
    GrB_MASK = 1,   // descriptor for the mask input of a method
    GrB_INP0 = 2,   // descriptor for the first input of a method
    GrB_INP1 = 3,   // descriptor for the second input of a method
}
GrB_Desc_Field ;

typedef enum
{
    GrB_OUTP_FIELD = GrB_OUTP,   // descriptor for output of a method
    GrB_MASK_FIELD = GrB_MASK,   // descriptor for the mask input of a method
    GrB_INP0_FIELD = GrB_INP0,   // descriptor for the first input of a method
    GrB_INP1_FIELD = GrB_INP1,   // descriptor for the second input of a method
    GrB_NAME = 10,
    GrB_LIBRARY_VER_MAJOR = 11,     // SuiteSparse:GraphBLAS version
    GrB_LIBRARY_VER_MINOR = 12,
    GrB_LIBRARY_VER_PATCH = 13,
    GrB_API_VER_MAJOR = 14,         // C API version
    GrB_API_VER_MINOR = 15,
    GrB_API_VER_PATCH = 16,
    GrB_BLOCKING_MODE = 17,   
    GrB_STORAGE_ORIENTATION_HINT = 100,
    GrB_ELTYPE_CODE = 102,
    GrB_ELTYPE_STRING = 106,
    GrB_INPUT1TYPE_CODE = 103,
    GrB_INPUT2TYPE_CODE = 104,
    GrB_OUTPUTTYPE_CODE = 105,
    GrB_INPUT1TYPE_STRING = 107,
    GrB_INPUT2TYPE_STRING = 108,
    GrB_OUTPUTTYPE_STRING = 109,
}
GrB_Field ;

int main (void)
{
    printf ("this is OK\n") ;
}
DrTimothyAldenDavis commented 1 year ago

For the get/set VOID methods: I don't need them for SuiteSparse GraphBLAS, at least so far. Should my GrB_Matrix_get_VOID (...) method do nothing and return GrB_NOT_IMPLEMENTED? Or does it do nothing and return GrB_SUCCESS?

DrTimothyAldenDavis commented 1 year ago

I'm starting to implement the current get/set proposal (with one change: GrB_OUTP, GrB_MASK, GrB_INP0, and GrB_INP1 cannot be defined twice in two enums). See https://github.com/DrTimothyAldenDavis/GraphBLAS/tree/GrB_get_set_proposal which I may call v8.1.0.

If you diff the current master branch (which contains a pending v8.0.1), you can see how things are going. So far so good, except for GrB_OUTP and company, which I have renamed GrB_OUTP_FIELD (same int values) for the GrB_Field enum.

Question about the GrB_NAME. I currently do not have a way to name matrices, vectors, or scalars. It will require a bit of work to add them, and it might require me to make a breaking change to my serialized blobs constructed by GrB_Matrix_serialize (so I can add the name of the matrix). I'd rather not do that, or at least postpone it to v9 of GraphBLAS. Can I return GrB_NOT_IMPLEMENTED if the user asks to set/get the name of a matrix, vector, or scalar?

What about methods that query the serialized blob (a void pointer)? There should be not set methods but there could be get methods: get the type (code or string), the name (in the future), the storage orientation, and so on. Pretty much everything you could call get on a GrB_Matrix, I could do the same for the void blob. That would be very useful. Right now, you have to tell GrB_Matrix_deserialize what the type is of the matrix, and if you get it wrong the deserialize fails (there is no typecasting done). So how does the user application know what type the blob is? That information would have to be held outside the blob, in the current API. But I have all that inside my blob. I have a method called GxB_deserialize_type_name which returns the type of the matrix in the blob (as a string). This could be done with a method such as this:

GrB_blob_get_String (void * blob, char *value, GrB_Field field) ;

Or call it GrB_serialized_get_String or whatever.

DrTimothyAldenDavis commented 1 year ago

Querying a GrB_Type for its size is super important; otherwise, how do we malloc the "X" array for GrB_extractTuples (for example)?

Should there be a GrB method for getting the identity value of a monoid? Perhaps:

GrB_Scalar identity ;
GrB_Scalar_new (&identity, monoidtype) ;   // must match the monoid type; no typecase permitted
GrB_Monoid_get_Scalar (monoid, identity, GrB_IDENTITY_VALUE) ;

To that, I would also add GxB_TERMINAL_VALUE for my built-in and user-defined terminal monoids. If this isn't added, I would just skip it for GrB_get, and use my existing GxB_Monoid_identity and GxB_Monoid_terminal methods.

I also see that there is no way to query a monoid for its additive operator, and no way to query a semiring for its monoid or its multiplicative operator. Is that intentional or an oversight? I could use GrB_Monoid_get_VOID and GrB_Semiring_get_VOID to return (void ) pointers to those GrB_BinaryOps or GrB_Monoid, with enums like GxB_OPERATOR, GxB_MONOID, etc. That would duplicate GxB functionality I already have (GxB_Monoid_operator, GxB_Semiring_add, and GxB_Semiring_multiply) so I might just skip it, and rely on my existing `GxB_ `methods I have already.

DrTimothyAldenDavis commented 1 year ago

GrB_Type is omitted from the caption for Table 3.13. It needs its own category. Currently, only GrB_NAME, GrB_ELTYPE_CODE, and GrB_ELTYPE_STRING are applicable.

I would also add GrB_SIZE = 110, just for GrB_Type_get_SIZE to get the size of the GrB_Type itself, but that's another issue.

DrTimothyAldenDavis commented 1 year ago

I've added GrB_SIZE=110 my implementation of Table 3.13. It can be used for GrB_Type_get_ENUM, GrB_Type_get_Scalar, and GrB_Type_get_SIZE, although GrB_Type_get_ENUM would be an unusual usage of that function since the 2nd parameter is size_t * which triggers the GrB_Type_get_SIZE if used this way:

size_t size ;
GrB_get (type, &size, GrB_SIZE) ;

But the ENUM usage would be triggered by this usage, which seems OK to me:

int size ;
GrB_get (type, &size, GrB_SIZE) ;

These usages return the size of the type. I've called the enum GrB_SIZE but it could also be more specific, like GrB_TYPE_SIZE.

See: https://github.com/DrTimothyAldenDavis/GraphBLAS/blob/GrB_get_set_proposal/Source/GrB_Type_get.c .

DrTimothyAldenDavis commented 1 year ago

Table 3.11 is missing something: GrB_DEFAULT or something like it. Consider this case:

GrB_get (descriptor, &value, GrB_MASK_FIELD)

What is the value if the descriptor is set to the default values (GrB_COMP and GrB_STRUCTURE are both unset)? In my GxB_Desc_get extension, I return GxB_DEFAULT which is zero.

What is the value if both set? My extension GxB_Desc_get returns GrB_COMP + GrB_STRUCTURE, which is the value 6 but perhaps that should be added to Table 3.11 as GrB_COMP_STRUCTURE = 6.

What do these return? I would say they all return GrB_DEFAULT = 0.

int value ;
GrB_get (GrB_DESC_T1, &value, GrB_OUTP_FIELD) ;
GrB_get (GrB_DESC_T1, &value, GrB_MASK_FIELD) ;
GrB_get (GrB_DESC_S, &value, GrB_INP0_FIELD) ;
GrB_get (GrB_DESC_S, &value, GrB_INP1_FIELD) ;
DrTimothyAldenDavis commented 1 year ago

I also allow for these usages (which I did in my GxB_Desc_set extension, and I also allow it in my implementation of the original GrB_Descriptor_set):

GrB_set (descriptor, GrB_COMP + GrB_STRUCTURE, GrB_MASK_FIELD) ;
GrB_set (descriptor, GrB_COMP | GrB_STRUCTURE, GrB_MASK_FIELD) ;

which sets GrB_COMP and GrB_STRUCTURE both to true, at the same time. Conveniently, the value of both expressions is 6 which is not in the enum list in the v2.0 C API Specification. If GrB_COMP_STRUCTURE=6 is added, then this could also be done as:

GrB_set (descriptor, GrB_COMP_STRUCTURE, GrB_MASK_FIELD) ;

jim22k commented 1 year ago

Tim, thanks for adding these review notes. I haven't forgotten about this -- just swamped with other work. I'll update the spec based on these suggestions hopefully late next week (unless someone else on the committee wants to jump in).

jim22k commented 1 year ago

Regarding GrB_NAME, the draft states "The GrB_NAME field is a special case regarding writability. All objects which have a GrB_NAME field default to an empty string.".

Added clarification about which objects have writeable names.

What happens if you ask a GrB_UnaryOp for the type using GrB_INPUT2TYPE_CODE? That looks like it should be an error. What happens if you ask a GrB_IndexUnaryOp like GrB_TRIL for its input1 type? Input1 is there for all GrB_IndexUnaryOps, but it would not make sense to ask for its type since it's not used by GrB_TRIL. That might not be an error, but at best something like GrB_NO_VALUE?

Added text in table 3.14 detailing when GrB_NO_VALUE should be returned.

The draft spec states: "When the input to GrB_set is a void*, an extra int argument is passed to indicate the size of the buffer." Shouldn't that parameter be size_t, not int?

Yes, fixed.

GrB_OUTP, GrB_MASK, GrB_INP0, and GrB_INP1 are defined as fields in both GrB_Descriptor_Field and in GrB_Field.

I went with your suggestion and added the _Field suffix to make those unique.

GrB_Type is omitted from the caption for Table 3.13. It needs its own category. Currently, only GrB_NAME, GrB_ELTYPE_CODE, and GrB_ELTYPE_STRING are applicable. I would also add GrB_SIZE = 110, just for GrB_Type_get_SIZE to get the size of the GrB_Type itself, but that's another issue.

Add GrB_SIZE = 110. It will return a GrB_Scalar to get the size of a GrBType. This keeps the usage consistent which the 4 return types (and GrB_get_SIZE is reserved as a helper to get the buffer size for char and void *).

Should there be a GrB method for getting the identity value of a monoid?

Do we need that for LAGraph? I think for now we won't include this and you can keep it as a GxB extension.

Table 3.11 is missing something: GrB_DEFAULT or something like it. Consider this case: What is the value if the descriptor is set to the default values (GrB_COMP and GrB_STRUCTURE are both unset)? In my GxB_Desc_get extension, I return GxB_DEFAULT which is zero.

Added these to the table.

jim22k commented 1 year ago

We still need to discuss how to get the type of a serialized blob. I would love to reuse our fields, but the GrB_<OBJ> would be a void *. I don't know if we want to assume that void * always means a serialized blob. Thoughts?

The alternative would be to add duplicate fields named GrB_Serialized_ELTYPE_CODE, GrB_Serialized_ELTYPE_STRING, etc.

DrTimothyAldenDavis commented 1 year ago

We still need to discuss how to get the type of a serialized blob. I would love to reuse our fields, but the GrB_ would be a void . I don't know if we want to assume that void always means a serialized blob. Thoughts?

It's probably OK for the _Generic macro to detect a (void *) object and assume it's a serialized object.

Let me try implementing GrB_get for the blob, and see what works and what breaks. We probably need to add another size_t parameter at the end of the signature, with the size of the blob, since otherwise it's unsafe to access the (void *) blob without knowing its size.

DrTimothyAldenDavis commented 1 year ago

Regarding GrB_get to return the monoid identity value, etc:

Do we need that for LAGraph? I think for now we won't include this and you can keep it as a GxB extension.

No, it's not needed for LAGraph. I can see a use case for it though. GrB_reduce to scalar with a non-opaque input/output value. With an accum operator present, the input/output scalar needs to be initialized. It might be handy to know the identity value in that case. Of course, the user could just know it already, or they could set the accum the NULL.

I also have methods that return the binary op of a monoid, the monoid of a semiring, and the multiplicative op of a semiring. I will add those as GxB* extenstions to GrB*_get_VOID, and I will declare my existing methods like GxB_Monoid_operator as historical (kept, but removed from my documentation).

DrTimothyAldenDavis commented 1 year ago

The spec and my implementation are moving along nicely.

I've implemented all these changes you made today in my current get_set branch: https://github.com/DrTimothyAldenDavis/GraphBLAS/tree/GrB_get_set_proposal .

It's not yet fully tested, but getting very close.

I haven't fully implemented the object naming because it will cause lots of changes throughout my code, and for now I'm trying to keep these changes isolated to just the new get/set methods.

In particular, I currently return GrB_NOT_IMPLEMENTED if you try to set the GrB_NAME of a matrix, vector, scalar, or descriptor. User-defined types and operators are given a generic name like "[unnamed_op]", not the empty string, but this of course can be changed with GrB_set. For user-defined monoids and semirings, I give them the name "func_MONOID" (for monoids) where "func" is the name of the binary op. Then GrB_set returns GrB_ALREADY_SET if the user tries to set the name of the monoid or semiring. All these changes will affect a wide swath of my code, and I want to do that later.

I haven't written an update to my user guide yet. My goal is to implement most or all of these spec changes, and then write my draft chapter in the user guide on GrB_get/GrB_set. Then I can carefully highlight how my implementation differs from the spec or extends it. That will serve as a useful list of places where either my draft implementation can change, or where the draft spec might change. I will remove my GxB_get / GxB_set from the user guide and declare them "historical" in GraphBLAS.h. They will still work but they are fully superseded by GrB_get/GrB_set, which is nice.

I'm still unsure about things like GrB_Monoid_set_ENUM, which can do nothing to a monoid. Does it return GrB_NOT_IMPLEMENTED? GrB_INVALID_VALUE? Silently do nothing and return GrB_SUCCESS? Here are all of my get/set methods that do nothing and I don't foresee them doing anything when I've completely implemented the spec and my extensions. There are a lot of them, and for now I'm returning GrB_NOT_IMPLEMENTED for these:

GrB_BinaryOp_set_Scalar
GrB_BinaryOp_set_ENUM
GrB_BinaryOp_get_VOID
GrB_BinaryOp_set_VOID

GrB_IndexUnaryOp_set_Scalar
GrB_IndexUnaryOp_set_ENUM
GrB_IndexUnaryOp_get_VOID
GrB_IndexUnaryOp_set_VOID

GrB_UnaryOp_set_Scalar
GrB_UnaryOp_set_ENUM
GrB_UnaryOp_get_VOID
GrB_UnaryOp_set_VOID

GrB_Descriptor_get_VOID
GrB_Descriptor_set_VOID

GrB_Matrix_get_VOID
GrB_Matrix_set_VOID

GrB_Scalar_get_VOID
GrB_Scalar_set_VOID

GrB_Vector_get_VOID
GrB_Vector_set_VOID

GrB_Monoid_set_Scalar
GrB_Monoid_set_ENUM
GrB_Monoid_set_VOID

GrB_Semiring_set_Scalar
GrB_Semiring_set_ENUM
GrB_Semiring_set_VOID

GrB_Type_set_Scalar
GrB_Type_set_ENUM
GrB_Type_set_VOID
GrB_Type_get_VOID

I'm using the GrB_Monoid_get_VOID to return the binary operator, and GrB_Semiring_get_VOID to get the monoid and the multiply op, as GxB extensions. I'm using the get/set VOID for the GrB_GLOBAL object for many extensions. Otherwise, I'm not using any get_VOID or set_VOID (see the list above).

DrTimothyAldenDavis commented 1 year ago

Regarding your question about the serialized blob: I think it's safe to use (void *) to always mean a serialized blob. The only thing that won't work is the GrB_NULL for the default descriptor. For example:

GrB_get (GrB_NULL, &i, GrB_OUTP) ; That won't work since GrB_NULL is a generic (void *) pointer, not necessarily a GrB_Descriptor. But I don't think that's a big loss. If I do

char name [...] ;
GrB_get_Descriptor_String (GrB_NULL, name, GrB_NAME) ;

I return the valid result of "GrB_NULL". GrB_get (GrB_NULL, name, GrB_NAME) won't work if (void *) means the blob, but that's no big deal.

I think it's more important to let (void ) always be a serialized blob, to allow the blob type to be queried. There are no other (void ) pointers in GraphBLAS, except for the NULL descriptor.

It should also be possible to reuse all the existing fields. I would just need to augment the signatures with this (for example):

GrB_serialized_get_String (blob, value, field, sizeof_blob) ; and so on. But that will work just fine with the existing GrB_get _Generic macro. I just need to use VA_ARGS arguments for both GrB_get and GrB_set.

DrTimothyAldenDavis commented 1 year ago

The spec states that all three of these can be performed but GrB_ELTYPE_STRING seems redundant:

GrB_set (GrB_Type, "typename", GrB_NAME) ;   // sets the name of the type to "typename"
GrB_get (GrB_Type, char * name, GrB_NAME) ;  // gets the name of the type
GrB_get (GrB_Type, char * name, GrB_ELTYPE_STRING) ;  // gets the name of the type

I would suggest removing the option for GrB_get with GrB_ELTYPE_STRING. The get/set of GrB_NAME is sufficient. It's also curiously mismatched; if it is the type name, then GrB_set should work with GrB_ELTYPE_STRING, but that seems odd.

DrTimothyAldenDavis commented 1 year ago

Jim writes:

Add GrB_SIZE = 110. It will return a GrB_Scalar to get the size of a GrB_Type.

In otherwords, it cannot return a plain int? The table says "GrB_Scalar (int32)" for this case. I may I have misunderstood the spec. Table 3.13 has many cases of "GrB_Scalar (int32)". I thought that meant the _ENUM signature (using an int) can be used as well, as in this:

// this is not allowed by the spec:
int v1, v2, v3 ;
GrB_Global_get_ENUM (GrB_GLOBAL, &v1, GrB_LIBRARY_VER_MAJOR) ;
GrB_Global_get_ENUM (GrB_GLOBAL, &v2, GrB_LIBRARY_VER_MINOR) ;
GrB_Global_get_ENUM (GrB_GLOBAL, &v3, GrB_LIBRARY_VER_PATCH) ;
printf ("library %d.%d.%d\n", v1,v 2, v3) ;

But from the discussion about getting the size of a GrB_Type (as just a GrB_Scalar and not a plain int) this is not allowed. Or to use the polymorphic cases:

// this is not allowed by the current v2.1 spec:
int v1, v2, v3 ;
GrB_get (GrB_GLOBAL, &v1, GrB_LIBRARY_VER_MAJOR) ;
GrB_get (GrB_GLOBAL, &v2, GrB_LIBRARY_VER_MINOR) ;
GrB_get (GrB_GLOBAL, &v3, GrB_LIBRARY_VER_PATCH) ;
printf ("library %d.%d.%d\n", v1,v 2, v3) ;

But the following must be used instead?

int v1, v2, v3 ;
GrB_Scalar s ;
GrB_Scalar_new (&s, GrB_INT32) ;
GrB_Global_get_Scalar (GrB_GLOBAL, s, GrB_LIBRARY_VER_MAJOR) ;
GrB_Scalar_extractElement (&v1, s) ;
GrB_Global_get_Scalar (GrB_GLOBAL, s, GrB_LIBRARY_VER_MINOR) ;
GrB_Scalar_extractElement (&v2, s) ;
GrB_Global_get_Scalar (GrB_GLOBAL, s, GrB_LIBRARY_VER_PATCH) ;
GrB_Scalar_extractElement (&v3, s) ;

That seems very tedious to me. I have many GxB uses for GrB_get and GrB_set that use plain int's that are not enums. For example, the spec does not allow me to do this with plain int's:

// this is not allowed by the v2.1 spec:
GrB_get (GrB_GLOBAL, &nthreads, GxB_NTHREADS) ;
GrB_set (GrB_GLOBAL, 8, GxB_NTHREADS) ;
GrB_get (context, &nthreads, GxB_NTHREADS) ;
GrB_set (context, 8, GxB_NTHREADS) ;
GrB_get (GrB_FP64, &size, GrB_SIZE) ; // returns the size of the type, as sizeof(double) in this case
GrB_set (descriptor, GxB_COMPPRESSION_ZSTD + 6, GxB_COMPRESSION) ;

and so on. Those are all plain int's or int *, and it would be very tedious to wrap them in a GrB_Scalar first.

Why not rename the GrB_get_*_ENUM and GrB_set_*_ENUM methods as simply _INT, not _ENUM, and then use them for both enums and plain int's? The name _INT is more accurate because the function signature says int. For example:

GrB_get_Type_ENUM (GrB_Type type, int *value, GrB_FIELD field) ;

The second parameter is not an enum but an int. Technically, the C compiler can use int16_t, or uint8_t, if it likes. Section 6.7.2.2 of the ANSI C11 spec states:

Each enumerated type shall be compatible with char, a signed integer type, or an unsigned integer type. The choice of type is implementation-defined, but shall be capable of representing the values of all the members of the enumeration. The enumerated type is incomplete until immediately after the } that terminates the list of enumerator declarations, and complete thereafter.

An implementation may delay the choice of which integer type until all enumeration constants have been seen.

So we cannot actually assume that all enum's are int's. The values of the constants in an enum have type int, but the enum itself can be any compatible type.

Since enum's are not int's, the name of this method is not correct:

GrB_Global_get_ENUM (GrB_Global, int, GrB_FIELD) ;

since the 2nd parameter is not an enum. It's an int. I use int in the _Generic trigger case as well. See for example

https://github.com/DrTimothyAldenDavis/GraphBLAS/blob/fbcff5242a159a55b01ff6bcd6f8f5fd329e245d/Include/GraphBLAS.h#L4581

#define GrB_get(object,value,field)                                         \
    _Generic                                                                \
    (                                                                       \
        (object),                                                           \
            GrB_Scalar :                                                    \
                _Generic                                                    \
                (                                                           \
                    (value),                                                \
                        GrB_Scalar  : GrB_Scalar_get_Scalar ,               \
                        char *      : GrB_Scalar_get_String ,               \
                        int *       : GrB_Scalar_get_ENUM   ,               \
                        size_t *    : GrB_Scalar_get_SIZE   ,               \
                        void *      : GrB_Scalar_get_VOID                   \
                ) ,                                                         \
...

and note the strange case above with "int * : GrB_Scalar_get_ENUM" appears. This should be called:

GrB_Global_get_INT (GrB_Global, int, GrB_FIELD) ;

and likewise all other _ENUM methods should be renamed _INT.

Technically, this is broken and might not compile because GrB_Type_code could be uint8_t:

GrB_Type_Code code ;
GrB_get (type, &code, GrB_ELTYPE_CODE) ;

because sizeof(code)could be 1, not the same as sizeof(int), and then the _Generic case for int * will not trigger a call to GrB_Type_get_ENUM.

What we need instead is this:

int code ;
GrB_get (type, &code, GrB_ELTYPE_CODE) ;

which is the same as this non-polymorphic case

GrB_Type_get_INT (type, &code, GrB_ELTYPE_CODE) ; This is safe because we know that sizeof(code) == sizeof (int). Then the int code can be compared with:

if (code == GrB_BOOL_CODE) ... etc because the compiler will typecast GrB_BOOL_CODE from the enum (perhaps of size 1, 2, or 4, or whatever) to the int.

The current draft of the v2.1 C API makes an assumption about the ANSI C compiler and may break if that assumption doesn't hold. The ANSI C11 spec does not require a compiler to follow any rule that says sizeof(GrB_Type_Code) == sizeof(int).

jim22k commented 1 year ago

@DrTimothyAldenDavis the spec committee agreed to change _get_ENUM to _get_INT32. This can be used for enumerated values as well as any other values (e.g. nthreads, library_major_version, etc).

DrTimothyAldenDavis commented 1 year ago

Yay! I'll update my implementation shortly.

DrTimothyAldenDavis commented 1 year ago

I've updated my implementation with the change of ENUM to INT32, in my latest push to this branch: https://github.com/DrTimothyAldenDavis/GraphBLAS/tree/GrB_get_set_proposal

The only thing left in my implementation where I currently deviate from the spec is setting the GrB_NAME more than once for GrB_Type and algebraic objects. Either way is fine with me. Just let me know if that decision is settled, and I'll revise my implementation to conform.

REVISED: My implementation conforms to the requirement that the name of a type or algebraic object can be set only once.

I've started a branch of LAGraph (https://github.com/GraphBLAS/LAGraph/tree/v1.1) where I will be adding usage of the new GrB_get/set methods.

DrTimothyAldenDavis commented 1 year ago

One new issue about the current v2.1. draft: GrB_SIZE returns an INT32 for the size of a type. It should be UINT64 (and then it would require a GrB_Scalar). Otherwise, we're limiting the size of a user-defined type to 2GB.

DrTimothyAldenDavis commented 9 months ago

One new issue about the current v2.1. draft: GrB_SIZE returns an INT32 for the size of a type. It should be UINT64 (and then it would require a GrB_Scalar). Otherwise, we're limiting the size of a user-defined type to 2GB.

In my draft v9.0.0 of SuiteSparse:GraphBLAS, GrB_Type_get_Scalar assigns a UINT64 value to the GrB_Scalar, so if a GrB_Scalar of type GrB_UINT64 is passed in, no typecasting occurs and the type can thus be larger than INT_MAX. This behavior is an extension to the current v2.1 draft spec, however.

When calling GrB_Type_get_INT32, the GrB_SIZE field appears so the return value is an int32_t scalar, per the current spec.

DrTimothyAldenDavis commented 9 months ago

The spec should mention that when GrB_get and GrB_set use a GrB_Scalar parameter, that typecasting is done as needed. That is, the user need not provide a GrB_Scalar of the expected type, and that GraphBLAS will typecast if the type is not what is expected.

DrTimothyAldenDavis commented 9 months ago

The size input parameter to GrB_Type_new is size_t, but the type for GrB_Type_get_INT32 is int32_t, which is not consistent.

The type of the size parameter GrB_get (type, size, GrB_SIZE) should be either just GrB_Scalar (so it can return a GrB_UINT64), or a C scalar of type size_t (thus using GrB_Type_get_SIZE instead of GrB_Type_get_INT32).

jim22k commented 8 months ago

@DrTimothyAldenDavis I changed the spec so that GrB_get (type, size, GrB_SIZE) will return a size_t instead of int32_t.

DrTimothyAldenDavis commented 8 months ago

@DrTimothyAldenDavis I changed the spec so that GrB_get (type, size, GrB_SIZE) will return a size_t instead of int32_t.

Yes -- I got the notification on email from github. I've already revised my draft v9 accordingly. I added GrB_SIZE as a valid input enum for GrB_Type_get_SIZE, returning a size_t. I've removed the option of returning it as an int32_t:

https://github.com/DrTimothyAldenDavis/GraphBLAS/commit/8a1b6aed66080bcafe0f03f0746ec56df2c4e756

Since there is no GrB_Type of size_t, so the GrB_Scalar return value is best done as GrB_UINT64.

That does bring up a related point about GrB_get returning a GrB_Scalar. The spec should state that the return value of GrB_get need not match the type of the GrB_Scalar, and that in that case, GrB_get will typecast its result into the type of the GrB_Scalar. This requires that the GrB_Scalar have a built-in type, of course.

The type of size_t could be 32-bits on a 32-bit system, or 64-bit on a 64-bit system, so I have to typecast in general. Passing in a GrB_Scalar of type GrB_UINT64 is the best match, but even if the GrB_Scalar is GrB_INT32 then the sizeof(type) can typically safely returned, just typecasting to the result into the type of the GrB_Scalar.

Likewise, if GrB_set is passed a GrB_Scalar, then the spec should state that GraphBLAS will typecast the GrB_Scalar into the expected type, as needed.

eriknw commented 7 months ago

Heads up: we plan to merge this PR in 1 day 🚀

DrTimothyAldenDavis commented 7 months ago

Yay! Just in time for Handel's Hallelujah Chorus. 😀

tgmattso commented 7 months ago

The spec is done. No more pull requests. Make any changes in the master branch.

For a standard, I want one place where we can find the state of the full specification. All this pull request stuff at this point just adds confustion.

—Tim

On Dec 23, 2023, at 2:03 PM, Doc McMillan @.***> wrote:

@mcmillan03 commented on this pull request.

In graph-api-c/appendices.tex https://github.com/GraphBLAS/graphblas-api-c/pull/73#discussion_r1435718715:

@@ -4,15 +4,15 @@ \chapter{Revision history}

Changes in 2.1.0 (Released: 22 December 2023): % FIXME \begin{itemize} -\item We added a capability for meta-data associated with each GraphBLAS object and the -library implementation (the global scope) as well. This was done through the new -{\sf GrB_get} and {\sf GrB_set} methods with {({\sf feild}, {\sf value})} pairs. We also needed

  • a new error code for the case where an attempt is made to write to a write-once value, +\item (Issue BB-28, BB-27, BB-13, BB-7) We added a capability for meta-data associated with each GraphBLAS +object and the library implementation (the global scope) as well. This was done through the new +{\sf GrB_get} and {\sf GrB_set} methods with {({\sf field}, {\sf value})} pairs. We also needed
  • a new error code for the case where an attempt is made to write to a write-once feild, field is misspelled on this line

— Reply to this email directly, view it on GitHub https://github.com/GraphBLAS/graphblas-api-c/pull/73#pullrequestreview-1795500343, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATVME2GZ4TBSZF7Z3GFS4DYK5IMFAVCNFSM6AAAAAASABBEHGVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTOOJVGUYDAMZUGM. You are receiving this because your review was requested.

mcmillan03 commented 7 months ago

There is a misspelled word....see comment.

On Sat, Dec 23, 2023 at 2:06 PM Tim Mattson @.***> wrote:

The spec is done. No more pull requests. Make any changes in the master branch.

For a standard, I want one place where we can find the state of the full specification. All this pull request stuff at this point just adds confustion.

—Tim

On Dec 23, 2023, at 2:03 PM, Doc McMillan @.***> wrote:

@mcmillan03 commented on this pull request.

In graph-api-c/appendices.tex < https://github.com/GraphBLAS/graphblas-api-c/pull/73#discussion_r1435718715>:

@@ -4,15 +4,15 @@ \chapter{Revision history}

Changes in 2.1.0 (Released: 22 December 2023): % FIXME \begin{itemize} -\item We added a capability for meta-data associated with each GraphBLAS object and the -library implementation (the global scope) as well. This was done through the new -{\sf GrB_get} and {\sf GrB_set} methods with {({\sf feild}, {\sf value})} pairs. We also needed

  • a new error code for the case where an attempt is made to write to a write-once value, +\item (Issue BB-28, BB-27, BB-13, BB-7) We added a capability for meta-data associated with each GraphBLAS +object and the library implementation (the global scope) as well. This was done through the new +{\sf GrB_get} and {\sf GrB_set} methods with {({\sf field}, {\sf value})} pairs. We also needed
  • a new error code for the case where an attempt is made to write to a write-once feild, field is misspelled on this line

— Reply to this email directly, view it on GitHub < https://github.com/GraphBLAS/graphblas-api-c/pull/73#pullrequestreview-1795500343>, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AATVME2GZ4TBSZF7Z3GFS4DYK5IMFAVCNFSM6AAAAAASABBEHGVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTOOJVGUYDAMZUGM>.

You are receiving this because your review was requested.

— Reply to this email directly, view it on GitHub https://github.com/GraphBLAS/graphblas-api-c/pull/73#issuecomment-1868378079, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANXEP5KYWA4TUPC5EYY4QLYK5IVPAVCNFSM6AAAAAASABBEHGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRYGM3TQMBXHE . You are receiving this because you were mentioned.Message ID: @.***>