GraphBLAS / binsparse-specification

A cross-platform binary storage format for sparse data, particularly sparse matrices.
https://graphblas.org/binsparse-specification/
BSD 3-Clause "New" or "Revised" License
15 stars 4 forks source link

Handling ISO values #26

Closed willow-ahrens closed 1 year ago

willow-ahrens commented 1 year ago

In response to our discussion around ISO values today, I'll point people to my draft 2.0 spec: https://github.com/GraphBLAS/binsparse-specification/pull/20. I think either option is easy for me to parse, but option 2 is a closer match to a level-by-level format spec, so I thought I'd ask what y'all think:

Two main options were discussed today:

  1. Keep the iso value as part of the datatype (i.e. "iso[int8]"). V1.0 is unchanged. V2.0 needs a change so that the Element Level will absorb the functionality of the ISO Level.
  2. Handle things in the formats (i.e. "iso[CSR]"). The 2.0 PR is unchanged, but V1.0 needs a change so that the predefined formats (like "CSR" or "COO") can also accept "iso[CSR]" or "iso[COO]", and "iso" is no longer part of the datatype.

In option 1, we would add text to the description of an ElementLevel to say that

If the values_type is `iso`, then the values array is instead to be interpreted as follows:

: values
:: Array of size `1` whose element holds the value of all explicit entries in the sparse tensor.

In option 2, as long as we're in the business of defining format aliases, we would also say that "CSR" is an alias for

{
  "format": {
    "level": "dense",
    "rank": "1",
    "subformat": {
      "level": "sparse",
      "rank": "1",
      "subformat": {
        "level": "element",
      }
    }
  },
}

and "iso[CSR]" is an alias for

{
  "format": {
    "level": "dense",
    "rank": "1",
    "subformat": {
      "level": "sparse",
      "rank": "1",
      "subformat": {
        "level": "iso",
      }
    }
  },
}

I really think either is fine here, so a simple poll: ❤️ react for 1, 🚀 react for 2

eriknw commented 1 year ago

A third option is to not have it encoded in the datatype (i.e., not "iso[int8]") or the format (i.e., not "iso[CSR]"), and instead have e.g. CSR as the format, int8 as the datatype, and introduce a new metadata key such as "is_iso": true.

willow-ahrens commented 1 year ago

Closed with #28