LAGraph needs a faster Matrix Market reader/writer.

jamesETsmith commented 1 year ago

Out of curiosity, are you interested in improving serial matrix market IO or adding parallel options?

mcmillan03 commented 1 year ago

Both would be nice.

On Fri, Mar 24, 2023, 1:11 PM James E T Smith @.***> wrote:

Out of curiosity, are you interested in improving serial matrix market IO or adding parallel options?

— Reply to this email directly, view it on GitHub https://github.com/GraphBLAS/LAGraph/issues/175#issuecomment-1483352456, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANXEP2GK6GAGN4NQHPJK2TW5X5VXANCNFSM6AAAAAAWEJL5CI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

DrTimothyAldenDavis commented 1 year ago

Comments from Kasimir Gabert, about one potential solution:

The library/approach I took that was showing a 50x improvement to input performance on those machines (while also going from an unsorted edge lists with duplicate edges and self-loops) is something we’ve called PIGO:

GitHub: https://github.com/GT-TDAlab/PIGO

Paper: https://doi.org/10.1109/IPDPSW52791.2021.00050

As we discussed, PIGO is written in C++, and so it could not be directly used in the base C GraphBLAS libraries. However, C versions of PIGO’s idea have been built in ~days (I think an early version was developed for hypergraphs for use in PaToH.)

As you saw, focusing on I/O really changes how useful graph systems can be for analysts. It isn’t sustainable to wait minutes for I/O when the runtime then takes around ten seconds; further, binary files don’t make sense, as much of the human effort is spent changing / refining / fixing the graph! I am sure that without PIGO there wouldn’t be a full-sized graph for the dataset we are working on. There’s a big difference between 2 seconds and 103 seconds 😊

Scott McMillan writes:

Yesterday Kas showed me that he can load an .mtx file 50x faster than LAGraph’s mmread.

DrTimothyAldenDavis commented 1 year ago

alugowski commented 1 year ago

Kasimir eloquently articulated my motivation for writing fast_matrix_market. Such a common and important operation should not be so slow.

Thanks for the PIGO link! There are some great ideas in the paper.

It made me curious how their approach compares to FMM, so I wrote some benchmarks to compare the various loaders: https://github.com/alugowski/sparse-matrix-io-comparison so far there's fast_matrix_market, PIGO, LAGraph, GraphBLAS via FMM, Eigen (native and via FMM).

Quick summary of the points relevant to this discussion:

plain FMM (loading to COO arrays): 2000 MB/s (down to 1400 MB/s for very large files)
plain PIGO (loading to COO arrays): 2500 MB/s (down to 475 MB/s for very large files)
LAGraph (loading to GrB_Matrix): 55 or 85 MB/s, depending on data sort
FMM (loading to GrB_Matrix): 190 or 1100 MB/s, depending on data sort

That is on my laptop (6 core). I would love it if someone tried it on a large machine as I currently don't have easy access to one.

alugowski commented 1 year ago

See also https://github.com/alugowski/fast_matrix_market

I hope folks are finding this useful! Feedback very welcome, positive or negative.

For those that don't know, FMM includes ready-made methods to read and write GrB_Matrix and GrB_Vector from/to MatrixMarket. Supports LAGraph's type information extension. FMM is trivially installable via CMake, and the usage is literally one line.

It's a complex code, but both MM and GraphBLAS support several data structures each and there is a lot of complexity if you choose to support more than just the basics. Even more so if you want to minimize duplicating the matrix.

A lot of care has been taken to implement all variations in the most efficient way that the GraphBLAS API allows, with less efficient fallbacks if not possible. For example the write function has about a half-dozen or so code paths depending on coordinate/array, row/column ordering, and library feature support. With SuiteSparse:GraphBLAS writing to MatrixMarket is nearly always zero-copy, with data fetched directly from the GrB_Matrix via iterators. That means the graph can take up a majority of RAM.

Also supports complex values, which LAGraph's MM methods do not.

GraphBLAS / LAGraph

LAGraph needs a faster Matrix Market reader/writer. #175