GraphBLAS / graphblas-api-c

Other
7 stars 3 forks source link

GrB_Matrix_serialize / deserialize #37

Closed DrTimothyAldenDavis closed 2 years ago

DrTimothyAldenDavis commented 2 years ago

I propose this signature for GrB_Matrix_serialize:

GrB_Info GrB_Matrix_serialize       // serialize a GrB_Matrix to a blob
(
    // output:
    void **blob_handle,             // the blob, allocated on output
    size_t *blob_size_handle,       // size of the blob on output
    // input:
    GrB_Matrix A,                   // matrix to serialize
    const GrB_Descriptor desc       // descriptor to select compression method
                                    // and to control # of threads used
) ;

but that assumes that I malloc the blob which is later freed by the user. If you don't want that then this would also work:

GrB_Info GrB_Matrix_serialize       // serialize a GrB_Matrix to a blob
(
    // output:
    void *blob,                     // the blob, pre-allocated by the caller
    size_t *blob_size_handle,       // size of the blob on input, amount filled on output
    // input:
    GrB_Matrix A,                   // matrix to serialize
    const GrB_Descriptor desc       // descriptor to select compression method
                                    // and to control # of threads used
) ;

Please add the descriptor so I can use it (as GxB at least) to select compression methods and levels to use. Compression is important.

For GrB_Matrix_deserialize, I propose:

GrB_Info GrB_Matrix_deserialize     // deserialize blob into a GrB_Matrix
(
    // output:
    GrB_Matrix *C,      // output matrix created from the blob
    // input:
    const void *blob,   // the blob
    size_t blob_size,   // size of the blob
    GrB_Type type,      // type of the matrix C.  Required if the blob holds a
                        // matrix of user-defined type.  May be NULL if blob
                        // holds a built-in type.  If not NULL and the blob
                        // holds a matrix of a built-in type, then C is
                        // typecasted to this requested type.
    const GrB_Descriptor desc       // to control # of threads used and
                        // whether or not the input blob is trusted.
) ;

I'd like a descriptor setting that tells me if the blob is trusted. If it is, the deserialize is faster. If it can come from an untrusted source, then I can do more exhaustive checks to ensure the resulting matrix is valid.

If passing matrices between MPI processes, serialize/deserialize should be as fast as possible, so the blob can be trusted. If reading the blob from a file that is known to be created by the same person, it can be trusted. But if a blob comes from outside, it shouldn't be trusted and I should do more costly checks on it.

DrTimothyAldenDavis commented 2 years ago

In the latest update to GrB_Matrix_deserialize and GrB_Matrix_serialize: the parameters are jumbled. The serialized_data and serialized_size parameters appear in different orders in each method. In GrB_Matrix_build and GrB_Matrix_extractTuples, an array comes first followed by the size of the array.

In the updated GrB_Matrix_deserialize, the size comes first followed by the array. The GrB_Matrix_serialize uses the pattern in the rest of GrB* methods: the array comes first, followed by its size.

GrB_Matrix_deserialize is an outlier. The parameters "serialized_size" and "serialize_data" should be swapped.

DrTimothyAldenDavis commented 2 years ago

The same pattern holds for GrB_extract and GrB_assign, for the integer index arrays. The arrays come first, followed by their size.

BenBrock commented 2 years ago

GrB_Matrix_deserialize is an outlier. The parameters "serialized_size" and "serialize_data" should be swapped.

You're right. Just swapped the order of the parameters in GrB_Matrix_deserialize to make them consistent.

DrTimothyAldenDavis commented 2 years ago

Great!