Support for index maps: Blocked and diagonal sparse arrays

hameerabbasi commented 2 months ago

I'm writing this issue to gauge support for index maps, which define a virtual indexing space alongside a physical storage space. These can be useful to define batched and diagonal sparse arrays. An index map can be A mapping from the N virtual indices to M storage indices using the operators +, -, *, /, % and parantheses only (division is integer division), the usual BODMAS rules apply. This also leads to supporting the diagonal format. I'll raise an issue on the binsparse repo and link it here.

Here is an example field for a blocked CSR array, with a batch size of 3.

"index_map": ["i0 / 3", "i0 % 3", "i1 / 3", "i1 % 3"],

The way this can be interpreted is there are two virtual dimensions in the array, and four physical storage dimensions. i0 and i1 are the virtual indices, and the four elements of index_map tell us how to index into the existing four dimensions with two indices. Here is another example for a 5x5 diagonal matrix:

"index_map": ["i0", "i0 + i0 // 5 + i1"],

This can also support "broadcasting" in NumPy via virtual dimensions. Here is an example with np.broadcast_to(x[:, None, :], ...) with x.ndim == 2:

"index_map": ["i0", "0 * i1 + i2"],

xref https://github.com/data-apis/array-api/issues/840#issuecomment-2360489542

pearu commented 2 months ago

Notice that there exists subtle issues with supporting batched CSR sparse arrays. For instance, each batch may have different number of specified elements, that is, the batch of column indices must be a ragged array. There are a couple of ways for an array library that lacks support for ragged arrays to still support batched CSR sparse arrays: (i) support only batches with the same number of specified elements (torch currently uses this approach), (ii) use the last element of compressed row indices to define the number of specified elements for a given batch, this allows to use strided arrays as column indices with a downside that some of its items are unused when batches have different number of specified elements, see https://github.com/pytorch/pytorch/pull/84843 for a draft implementation.

hameerabbasi commented 2 months ago

batched

I think I meant blocked. I've modified the issue to reflect this.

pearu commented 2 months ago

In case of PyTorch BSR/BSC sparse tensors, indices are defined as indices of blocks, not of tensor elements as suggested here.

hameerabbasi commented 2 months ago

In case of PyTorch BSR/BSC sparse tensors, indices are defined as indices of blocks, not of tensor elements as suggested here.

That's supported without any additional modifications in binsparse, just need ["dense", "compressed", "dense", "dense"] as the nested levels.

BenBrock commented 2 months ago

This is a very interesting idea—so, essentially, you can take any sparse matrix and project it into a higher dimensional space (using an index_map of size > len(shape)) or a lower dimensional space (using an index_map of size < len(shape))?

Custom formats already support a transpose key that allows reordering the dimensions. I think that would be the right place to put this. I think I have two primary concerns: the first is that it seems like this increases the burden on parser implementers. Not only do you have to re-order indices, but you also have to check the length of the transpose value to see the logical shape of the matrix, as well as parse several arbitrary functions from strings (this is quite likely not possible to do efficiently in statically compiled languages—not necessarily a deal breaker, but worth pondering). The second is that the implementation needs to have a sparse matrix format that supports every possible type of index map. Not sure how likely this is—maybe @willow-ahrens, who has a sparse tensor compiler, can comment more.

willow-ahrens commented 2 months ago

Many requests for Binsparse parser features (e.g. automatic reformatting, etc.) seem to require a highly capable tensor framework. Finch can currently implement rudimentary support for +, * and % on indices, but supports these kinds of maps a little differently (using wrapper array objects rather than an index map expression). I don't expect most frameworks to have default support for index maps (or even if they support some, they may not support all). However, I think most frameworks could implement a program which writes the tensor to COO, performs math on the coordinates, sorts them, and converts back.

I think that as we consider in-memory formats, we might want to move towards a clearer distinction between optional and required features.

willow-ahrens commented 2 months ago

A more moderate proposal here is allowing some kind of dimension grouping syntax. For example, we could say that (1, 2), (3, 4) represents a block matrix interpretation of a 4-D tensor, and (1, 2, 3) is a flattened 3-tensor.

amjames commented 2 months ago

There is tremendous value in defining a spec which generalizes to established patterns, even if they are not directly supported by the reference implementation, or considered optional etc. If pytorch were to consider adoption of the binsparse spec the ability to express and differentiate between blocked and hybrid sparse-compressed would be an important feature (outside of the existence of any implementation, as we already implement conversion mechanisms)

However, I do not think that this proposed mechanism is an ideal way to do it. To parrot some of the points raised above by @BenBrock and @willow-ahrens it puts a large onus on consumer implementations to be able to parse such expressions.

The spec does seem to have some ambiguities with respect to custom format levels sub-specification. I have reviewed the language there and that assessment is based on that and some of the examples defining pre-defined formats using custom formats. I believe that allowing for element to have rank would make it possible to disambiguate between blocked and hybrid layouts. However I am not prepared to say if it is the best way to do that. Should I convince myself I would happily open a separate issue to discuss.

@hameerabbasi

That's supported without any additional modifications in binsparse, just need ["dense", "compressed", "dense", "dense"] as the nested levels

If this is true then how would one specific the nested levels for a 4-d tensor compressed along dim 1 and with a 2-d dense value? In other words a CSR matrix, with dense matrix value? It seems to me that this would be the same with out any way to indicate to the consuming library in the JSON descriptor what the correct interpretation of the binary payload is.

The BSR 2-d array with shape (r, c) and block-shape (br, bc) has an index space (r / br, c / bc), and a values array of shape (num_stored_elements, br, bc) The 4-d hybrid tensor with shape (r, c, vr, vc) has an index space (r, c), and a values shape of (num_stored_elements, vr, vc). If br == vr, bc == vc, both have the same number of stored elements, and locations thereof align properly these two could have identical values arrays, but the correct interpretation of the entire full array is very different.

hameerabbasi commented 2 months ago

If this is true then how would one specific the nested levels for a 4-d tensor compressed along dim 1 and with a 2-d dense value? In other words a CSR matrix, with dense matrix value? It seems to me that this would be the same with out any way to indicate to the consuming library in the JSON descriptor what the correct interpretation of the binary payload is.

@amjames Ah, of course -- in this case, I misunderstood what @pearu said earlier:

In case of PyTorch BSR/BSC sparse tensors, indices are defined as indices of blocks, not of tensor elements as suggested here.

That lead me to believe BSR is indeed 4D, not 2D, when indexed. If it is indeed 2D, then my index map proposal handles that; in my proposal the block-size is (3, 3), but of course one can modify that string to have a different block size.

it puts a large onus on consumer implementations to be able to parse such expressions

I'm of the opinion that optional or partial support may be okay here. For example, if BSR is supported; recognize BSR index map patterns and then parse just the block size.

amjames commented 2 months ago

@hameerabbasi that is kind of what I am getting at with the idea of a non-scalar 'element'. Rather than needing to ID different patterns from mapping expressions the block size would be encoded into the shape of a n-d element that coupled with the tensor shape should give you all the information you need to properly interpret the index arrays and the data buffer.

The index_map proposal is still a bit hard on implementations because if you do not support blocked sparse layouts you still need to parse the expressions and properly classify them to indicate supported or not.

I am of course assuming there is some other use cases where a library may want to support other types of index_map but not support it generically, or not support blocking specifically etc.

Rather than a non-scalar element, perhaps it would be better expressed as a value-modifier on the data types section? In that case a BSR tensor with 3x3 blocks might have levels defined the same as CSR, but something like :

"data_types": {
    "values": "blocked[int8, 3, 3]"
    ...
}

Actually a "Value Modifier" would be better here. The blocked value interpretation is probably more closely related to complex dtypes than having dense inner arrays as the blocking is hidden from the user (with respect to the shape). The dense inner array pattern is better represented in the format structure as the described levels really "exist".

willow-ahrens commented 1 month ago

If the 4-tensor is really meant to represent a 2-matrix, we should make sure the format reflects this, not the values.

If the requirement is that binsparse must be able to represent every format in every system to be adopted, we need to accept that parsers will be allowed to error when the format is not compatible, or use a basic reformatting algorithm to convert unsupported formats to supported ones.

Upon further reflection, I think that the strategy I laid out to convert between an "index modifier" representation and a regular representation is not so difficult to implement. Like I said before, parsers that do not support blocked formats can simply copy the blocked format to COO, then do simple indexing arithmetic to convert (I, J, i, j) tuples to (I m + i, J n + j) tuples, then copy into an appropriate format.

hameerabbasi commented 1 month ago

What do people here think of an alternative "tree" representation, to reduce parsing load?

"index_map": ["i0 / 3", "i0 % 3", "i1 / 3", "i1 % 3"],

becomes

"index_map": [
  {"/": ["i0", 3]},
  {"%": ["i0", 3]},
  {"/": ["i1", 3]},
  {"%": ["i1", 3]},
],

Much more complicated to write by hand; potentially easier to generate and parse.

willow-ahrens commented 1 month ago

I'm hesitant about using any expression involving (+, -, *, /, %) to encode blocking, because it's very easy to write equivalent and invalid expressions, and may require symbolic analysis to simplify the expression and determine whether it is valid. For example, we could write indices that

access out of bounds
access the same elements of the matrix repeatedly
don't access all of the values inside the binsparse file Are any of these above situations actually desireable use cases?

If we want to restrict those cases, we could propose a simpler strategy which first splits dimensions, then fuses them. The first vector would specify what dimensions the new split indices would have. If we want to split a (24, 64) matrix into a (6, 4, 32, 32) tensor, then the first vector would be [(6, 4), (32, 32)]. Then, we would fuse with a list of dimension groups. For example, if we wanted to fuse dimension 0 with 2 and dimension 1 with 3, we would write [(0, 2), (1, 3)]. So the format would look like:

"split_sizes": [(6, 4), (32, 32)]
"fused_modes": [(0, 2), (1, 3)]

willow-ahrens commented 1 month ago

@hameerabbasi I think my proposal is similar to yours, I'm much more comfortable with these restricted styles of tensor splitting and fusion because they don't have as many edge cases

hameerabbasi commented 1 month ago

@willow-ahrens Fair enough; your proposal does address reshaping and blocking; but it doesn't address diagonal matrices or broadcasting. Broadcasting could be handled potentially via a new level format; one that repeats indices and makes everything read-only; but I don't yet see a clean way to represent diagonal matrices.

amjames commented 1 month ago

@willow-ahrens

access out of bounds

access the same elements of the matrix repeatedly

don't access all of the values inside the binsparse file Are any of these above situations actually desireable use cases?

Of this list I think OOB is really the only one that should be forbidden. (2) Is similar to iso value modifier, but more general and can be used to encode a format where ranges of the index space carry the same value without explicitly storing them. Something similar to broadcasting where we essentially have a zero stride. (3) Would cover layouts like the CSR extension mentioned by @pearu, some arrays essentially need to be padded to the size of the largest unless a ragged array is available.

@hameerabbasi I'm curious where you see broadcasting as a useful concept for data at rest/in-transit. Temporarily inserting stride-0 dimensions so that iterators can be created for operating on arrays is not something I think of as needed for this kind of transfer/storage format.

I do think that diagonal representations are important too, could those also be represented by a level format that signals the interpretation of indices at that level changes to represent diagonal offsets?

hameerabbasi commented 1 month ago

I'm curious where you see broadcasting as a useful concept for data at rest/in-transit. Temporarily inserting stride-0 dimensions so that iterators can be created for operating on arrays is not something I think of as needed for this kind of transfer/storage format.

@amjames Notice that this isn't just about over-the-network or on-disk, it's about inter-library exchange of data as well. For example, one could do the following:

def downstream_lib1_fn(arr: array) -> array:
    arr = lib1.from_binsparse(arr) # Or `lib1.asarray(arr)` for sparse-dense agnostic code
    # Perform ops on arr to get ret
    return ret

# User code
# `lib2.broadcast_to` produced a view into an existing sparse array
broadcasted_arr = lib2.broadcast_to(lib2_arr, shape)
downstream_lib1_fn(broadcasted_arr)

Here, lib1.from_binsparse would need to know that broadcasted_arr is a broadcasted view into an existing sparse array, and that could potentially be addressed in the format. This is the motivation here: To transfer arrays across library boundaries, regardless of the format.

GraphBLAS / binsparse-specification

Support for index maps: Blocked and diagonal sparse arrays #56