GraphBLAS / binsparse-specification

A cross-platform binary storage format for sparse data, particularly sparse matrices.
https://graphblas.org/binsparse-specification/
BSD 3-Clause "New" or "Revised" License
15 stars 4 forks source link

Custom fill values #23

Closed willow-ahrens closed 1 year ago

willow-ahrens commented 1 year ago

Is it in scope to allow users to choose their own fill value (sometimes called implicit value, usually zero)?

eriknw commented 1 year ago

This is something we've discussed, but I don't recall the current consensus.

A related issue is how to store the value of an iso-valued tensor. At first we were thinking to include the iso-value in metadata, but I think we are leaning towards saving a length-1 array to store the value so that we don't need to worry about how to accurately save the value in JSON.

So, if we store the iso-value in an array, then I suppose we could do the same for non-zero fill values.

BenBrock commented 1 year ago

Yes, this is in the current spec.

BenBrock commented 1 year ago

Current consensus is to store the single value as a length-1 array in the data container. (If we stored it directly in JSON, we'd have to worry about all types being representable both in pure JSON and also in the data container.)

willow-ahrens commented 1 year ago

My question was about a custom zero (background), not the case where all the nonzeros (foreground) are the same.

DrTimothyAldenDavis commented 1 year ago

I would have a problem with that in GraphBLAS. The implied value of entries not present in the matrix depends on the semiring used. Any given matrix can move between semirings. So in GraphBLAS, the "fill value" would have to be ignored.

willow-ahrens commented 1 year ago

In Julia a fill value is sometimes expected, and I think SuiteSparseGraphBlas defines a novalue type for this purpose. Would it be okay to just define novalue when working with graphblas matrices? Sometimes the fill value is very explicitly zero, sometimes it means something akin to "there is no edge here", and it would be nice to be able to say which it is in the format.

BenBrock commented 1 year ago

By design, GraphBLAS has no implied fill value. Empty entries in the matrix aren't nonzeros, they just don't contain anything. So GraphBLAS would have to ignore any fill value. (Which would probably be a fine thing to do.)

(The rationale for this is that the identity value changes constantly with the semiring, which will often be different in successive operations. Some operations may just use a binary operator and have no identity value at all.)

eriknw commented 1 year ago

I agree with @BenBrock, I think it would be okay for GraphBLAS to simply ignore the fillvalue and to not save a fillvalue by default.

I think allowing fill values is important for some uses cases. Zarr and pydata/sparse also allow non-zero fill-values.

willow-ahrens commented 1 year ago

To summarize the discussion today, it sounds like we want to add a JSON key for the fill called "fill", and we want the key to have two possible values: "specified" or "unspecified". Fill defaults to "unspecified". If "fill" is "specified", there's an hdf5 array called "fill" and it has a single value which is the fill value.

jim22k commented 1 year ago

Yes. The only other thing we might want to mention in the spec is that the fill array must have the same data type as the values array, so there is no need to specify the fill data type in the metadata.

willow-ahrens commented 1 year ago

I agree. Let's add that to the spec for now. It feels sortof related to https://github.com/GraphBLAS/binsparse-specification/issues/24

eriknw commented 1 year ago

Will the values for "fill" in metadata ever be anything other than "specified" or "unspecified". If it's boolean by nature, why not use a key in the metadata that stores boolean?

willow-ahrens commented 1 year ago

yes, let's make it boolean.