Closed willow-ahrens closed 1 year ago
I've added an example parser in https://github.com/willow-ahrens/Finch.jl/blob/0176e14e7731fa9062ff8e5638ea7fb5b55e9aa9/src/fileio/binsparse.jl
One note: I included dense[n] because it makes things easy to explain, but I really think that because dense[1], dense[1] is equal to dense[2], we should only support dense[1] to keep things normalized.
Another note: Finch understands things recursively, so that's how I've written the spec and the parser, but I get the sense that people may have been hoping for a list of format descriptors (i.e. "format": ["dense[1]", "sparse[2]"]
), which is easier to write by hand but does need a little bit of string parsing to get the rank info out.
I see--hierarchical is especially nice when a level spans multiple dimensions (such as COO). This may also be useful for e.g. DIA so there is less concern for how to compose and transform "1-d" levels to get more advanced levels.
Here's an example of a property graph for CSC with an iso int capacity and a float weight
{
"swizzle": [1, 0],
"format": {
"level": "dense",
"rank": 1,
"subformat": {
"level": "sparse",
"rank": 1,
"subformat": {
"level": "multiplex",
"subformats": {
"weight": {
"level": "element",
"value_type": "float32"
}
"capacity": {
"level": "iso",
"value_type": "int32"
}
}
}
}
},
}
@BenBrock @ivirshup Do we think there's a way that we can merge this into the spec in a section titled "Experimental v2.0 extensions (Subject To Change)" or something similar so that it doesn't have to live in a PR while we finalize 1.0?
I think that sounds like a fine idea. Ultimately we all want it to live in one spec document, and we'll have to have some way of specifying what's v1 and what's a v2 extension. What you suggested sounds good to me.
Alright, I'm merging this with language saying it's v2.0 only and subject to further discussion
Here's my proposal for n-dimensional arrays in binsparse, to advocate for Erik's
sparse[n]
anddense[n]
approach. I have tried to keep things as close to the existing standard as possible, so the array names have stayed exactly the same as the proposed 1.0.I had to move iso from the value type to the format descriptor.