GraphBLAS / binsparse-specification

A cross-platform binary storage format for sparse data, particularly sparse matrices.
https://graphblas.org/binsparse-specification/
BSD 3-Clause "New" or "Revised" License
16 stars 4 forks source link

A proposal for N-Dimensional sparse arrays #20

Closed willow-ahrens closed 1 year ago

willow-ahrens commented 1 year ago

Here's my proposal for n-dimensional arrays in binsparse, to advocate for Erik's sparse[n] and dense[n] approach. I have tried to keep things as close to the existing standard as possible, so the array names have stayed exactly the same as the proposed 1.0.

I had to move iso from the value type to the format descriptor.

github-actions[bot] commented 1 year ago

Automated Review URLs

willow-ahrens commented 1 year ago

I've added an example parser in https://github.com/willow-ahrens/Finch.jl/blob/0176e14e7731fa9062ff8e5638ea7fb5b55e9aa9/src/fileio/binsparse.jl

willow-ahrens commented 1 year ago

One note: I included dense[n] because it makes things easy to explain, but I really think that because dense[1], dense[1] is equal to dense[2], we should only support dense[1] to keep things normalized.

willow-ahrens commented 1 year ago

Another note: Finch understands things recursively, so that's how I've written the spec and the parser, but I get the sense that people may have been hoping for a list of format descriptors (i.e. "format": ["dense[1]", "sparse[2]"]), which is easier to write by hand but does need a little bit of string parsing to get the rank info out.

eriknw commented 1 year ago

I see--hierarchical is especially nice when a level spans multiple dimensions (such as COO). This may also be useful for e.g. DIA so there is less concern for how to compose and transform "1-d" levels to get more advanced levels.

willow-ahrens commented 1 year ago

Here's an example of a property graph for CSC with an iso int capacity and a float weight

{
  "swizzle": [1, 0],
  "format": {
    "level": "dense",
    "rank": 1,
    "subformat": {
      "level": "sparse",
      "rank": 1,
      "subformat": {
        "level": "multiplex",
        "subformats": {
          "weight": {
            "level": "element",
            "value_type": "float32"
          }
          "capacity": {
            "level": "iso",
            "value_type": "int32"
          }
        }
      }
    }
  },
}
willow-ahrens commented 1 year ago

@BenBrock @ivirshup Do we think there's a way that we can merge this into the spec in a section titled "Experimental v2.0 extensions (Subject To Change)" or something similar so that it doesn't have to live in a PR while we finalize 1.0?

BenBrock commented 1 year ago

I think that sounds like a fine idea. Ultimately we all want it to live in one spec document, and we'll have to have some way of specifying what's v1 and what's a v2 extension. What you suggested sounds good to me.

willow-ahrens commented 1 year ago

Alright, I'm merging this with language saying it's v2.0 only and subject to further discussion