GraphBLAS / graphblas-api-c

Other
7 stars 3 forks source link

GrB_PANIC is too extreme for invalid serialized data #63

Closed DrTimothyAldenDavis closed 2 years ago

DrTimothyAldenDavis commented 2 years ago

The draft v2.0 spec states that GrB_PANIC means that no program data of any kind can be guaranteed if a panic occurs (line 731). A PANIC means that no GraphBLAS object in the entire program can be trusted. Emphasis added:

If GraphBLAS method returns with a GrB_PANIC execution error, no guarantees can be made about the state of any program data.

This is a heavy statement so it should be up to the library implementer to interpret. It shouldn't be a required return value in other cases. Table 3.13 uses tamer language: "unknown internal error". I would call it "unrecoverable error" to match the statement on line 731, but that is another issue (#64). If any GrB_method returns GrB_PANIC, the user application should just call GrB_finalize and terminate.

The GrB_Matrix_deserialize spec states on line 2562:

GrB_PANIC Unknown internal error. This is also returned if serialized_data was invalid or corrupted. [Scott: This was an INVALID OBJECT error before but don’t think that is correct.]

PANIC is too extreme. It means that any and all program data (not just this particular serialized_data array) cannot be considered valid. Even though there is no GrB_error (&err, serialized_data) syntax, the error returned here should be GrB_INVALID_OBJECT, not GrB_PANIC.

I only return GrB_PANIC in two cases: (1) any GrB* (or GxB) method is called prior to calling GrB_init, or (2) there is an unrecoverable error in GrB_init itself (at the moment, this is only if I fail to connect to a GPU when CUDA is enabled, but this is not yet in my stable releases). In the future, if I use aggressive kernel fusion with non-blocking mode, I can imaging keeping a complex sequence or DAG of GrB calls, for later execution. If that DAG gets mangled, I may not be able to recover from it, and then all GrB objects might become mangled and untrusted. That would be a good use case for GrB_PANIC.

Another use case for the panic is a failure in a critical section. OpenMP cannot fail but a pthreads critical section can. I used to have both, and I would return PANIC if the synchronization failed. There's no way to recover from that error, and the failure might mean that any kind of data just got mangled. Nothing can be trusted, so PANIC. I now use OpenMP #omp critical exclusively so I don't panic in this case because it cannot occur.

A mere corrupted serialized_data object should not cause a panic. It's a read-only object. The user might not have passed in the right pointer. I can safely detect that the pointer I have does not point to a valid serialized object, so there is no need to PANIC and terminate the entire application. I can just return some kind of error like GrB_INVALID_OBJECT, or another kind of error code (see #64).

mcmillan03 commented 2 years ago

I am okay with putting this back to INVALID_OBJECT. I don't think there is official language that ties this error to being able to call error() on something afterward. Any chance this could be an API (immediate) error.

DrTimothyAldenDavis commented 2 years ago

There is no overall statement about GrB_INVALID_OBject, but every place where GrB_INVALID_OBJECT is listed as a possible error return value, the following language is used (emphasis added):

GrB_INVALID_OBJECT This is returned in any execution mode whenever one of the opaque GraphBLAS objects (input or output) is in an invalid state caused by a previous execution error. _Call GrBerror() to access any error messages generated by the implementation.

GrB_Matrix_deserialize would be the exception to this language. I think it's a reasonable exception.

mcmillan03 commented 2 years ago

Reverted to INVALID_OBJECT.