GraphBLAS / binsparse-specification

A cross-platform binary storage format for sparse data, particularly sparse matrices.
https://graphblas.org/binsparse-specification/
BSD 3-Clause "New" or "Revised" License
15 stars 4 forks source link

Conventions for converting Matrix Market files to binsparse #33

Open BenBrock opened 1 year ago

BenBrock commented 1 year ago

These are not strictly spec issues, but I have two unresolved issues with how to convert Matrix Market files to binsparse files.

  1. What shall we label the Matrix Market comments? I'm currently going with the key "comment," but we could also be more explicit and use a key like "MatrixMarket_comment."

  2. Given the following Matrix Market types, what should the default type for values be?

    • real - presumably float32 or float64. I would assume that most ASCII representations have a precision more suitable for float32? Which is more appropriate is difficult to determine empirically, even if you inspect the whole matrix.
    • complex - same issue as real, just slightly more complex
    • integer - I think we need to default to int64, unless we inspect the whole matrix to look for a max value.
    • pattern - This one is straightforward, I think. An ISO-valued matrix with a bint8 value equal to true.
DrTimothyAldenDavis commented 1 year ago

Some of the Matrix Market files would require float64. Using float32 would lose some information. The result looks fine.

BenBrock commented 1 year ago

@DrTimothyAldenDavis Is there any straightforward way to identify which files require float64, or should I just use float64 for everything?

DrTimothyAldenDavis commented 1 year ago

The only way would be to try it by converting each entry: float64 to float32, then back, and see if it differs. There's no metadata in the Matrix Market format in general, and no structured comments in my Matrix Market files, that has this information.

When I create a Matrix Market file from a float64 matrix, I use a method that minimizes the # of ascii characters printed in the output file. It's very tedious. For each entry, I write it to a string using 1 digit of precision, read it back, and compare. If the numbers differ, continue with 2 digits, 3 digits, etc, until I reach the number of digits just for that single number. Then I use that # of digits to write the floating-point value to the file. That way, I ensure that each matrix is identical in all 3 formats (MATLAB *.mat file, Matrix Market file, and Rutherford Boeing file).

When I read a Matrix Market file, I always use float64 (if the matrix real or complex).