Closed willow-ahrens closed 1 year ago
This is something we've discussed, but I don't recall the current consensus.
A related issue is how to store the value of an iso-valued tensor. At first we were thinking to include the iso-value in metadata, but I think we are leaning towards saving a length-1 array to store the value so that we don't need to worry about how to accurately save the value in JSON.
So, if we store the iso-value in an array, then I suppose we could do the same for non-zero fill values.
Yes, this is in the current spec.
Current consensus is to store the single value as a length-1 array in the data container. (If we stored it directly in JSON, we'd have to worry about all types being representable both in pure JSON and also in the data container.)
My question was about a custom zero (background), not the case where all the nonzeros (foreground) are the same.
I would have a problem with that in GraphBLAS. The implied value of entries not present in the matrix depends on the semiring used. Any given matrix can move between semirings. So in GraphBLAS, the "fill value" would have to be ignored.
In Julia a fill value is sometimes expected, and I think SuiteSparseGraphBlas defines a novalue
type for this purpose. Would it be okay to just define novalue
when working with graphblas matrices? Sometimes the fill value is very explicitly zero, sometimes it means something akin to "there is no edge here", and it would be nice to be able to say which it is in the format.
By design, GraphBLAS has no implied fill value. Empty entries in the matrix aren't nonzeros, they just don't contain anything. So GraphBLAS would have to ignore any fill value. (Which would probably be a fine thing to do.)
(The rationale for this is that the identity value changes constantly with the semiring, which will often be different in successive operations. Some operations may just use a binary operator and have no identity value at all.)
I agree with @BenBrock, I think it would be okay for GraphBLAS to simply ignore the fillvalue and to not save a fillvalue by default.
I think allowing fill values is important for some uses cases. Zarr and pydata/sparse
also allow non-zero fill-values.
To summarize the discussion today, it sounds like we want to add a JSON key for the fill called "fill", and we want the key to have two possible values: "specified" or "unspecified". Fill defaults to "unspecified". If "fill" is "specified", there's an hdf5 array called "fill" and it has a single value which is the fill value.
Yes. The only other thing we might want to mention in the spec is that the fill
array must have the same data type as the values
array, so there is no need to specify the fill data type in the metadata.
I agree. Let's add that to the spec for now. It feels sortof related to https://github.com/GraphBLAS/binsparse-specification/issues/24
Will the values for "fill"
in metadata ever be anything other than "specified"
or "unspecified"
. If it's boolean by nature, why not use a key in the metadata that stores boolean?
yes, let's make it boolean.
Is it in scope to allow users to choose their own fill value (sometimes called implicit value, usually zero)?