alugowski / sparse-matrix-io-comparison

Compare I/O of sparse matrix libraries
BSD 2-Clause "Simplified" License
6 stars 0 forks source link

sparse matrix I/O comparison

Compare I/O of sparse matrix libraries.

Some intentionally include matrix construction time. These timings can be affected by the sort order of the values in the MatrixMarket file.

Libraries are fetched from their main branches on GitHub. To pin a version modify the appropriate file in cmake/.

Build

C++ libraries

CMake will pull in all dependencies.

Exception is GraphBLAS, its benchmark is skipped if GraphBLAS is not found. Up to you to install GraphBLAS, brew install suite-sparse works on macOS.

cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build --config Release

builds everything into the build subdirectory.

Python libraries

In a virtual environment:

pip install -r requirements.txt

Datafiles

The benchmarks look for any *.mtx MatrixMarket files in the current directory and benchmark against these. For benchmarks of non Matrix Market formats, the datastructure is first populated with the MM file and then written to the tested format.

Use any method you wish to create the .mtx files.

generate_matrix_market

Generate randomized matrix market files of a specified size (in megabytes):

build/generate_matrix_market 1024

creates a file named 1024MiB.mtx in the current directory that is 1 GiB in size.

sort_matrix_market

Some benchmarks like GraphBLAS perform much better if the indices are sorted. Use sort_matrix_market to create a sorted copy of a .mtx file:

build/sort_matrix_market 1024MiB.mtx

Run

Run all benchmarks:

build/fmm
build/PIGO
build/graphblas_fmm

Or use Google Benchmark's filter option to run only some benchmarks:

build/fmm '--benchmark_filter=.*read.*'
build/PIGO '--benchmark_filter=.*read.*'
build/graphblas_fmm '--benchmark_filter=.*read.*'

Results

The benchmarks report the end-to-end time, as that is the primary thing the end user cares about. This includes overheads and any datastructure construction time. For example, the GraphBLAS benchmark may include the time for GrB_Matrix_build in addition to the I/O time. This is intentional.

In addition to the runtime in seconds each benchmark divides this time by the file size and reports an effective read speed in bytes/second. This normalized value is very informative:

Example results

M1 Macbook Pro with 16GiB RAM, 6 performance and 2 efficiency cores (ARM).

Input data is a random 1GiB file, generated by generate_matrix_market as above, and the same file sorted (by row then column index) by sort_matrix_market as above.

fast_matrix_market

bench_fmm:

----------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                                    Time             CPU   Iterations UserCounters...
----------------------------------------------------------------------------------------------------------------------------------------------
op:read/impl:FMM/format:MatrixMarket/problem:0/p:8/iterations:1/real_time                0.494 s         0.214 s             1 bytes_per_second=2.02622G/s problem_name=1024MiB.mtx
op:read/impl:FMM/format:MatrixMarket/problem:1/p:8/iterations:1/real_time                0.491 s         0.201 s             1 bytes_per_second=2.03837G/s problem_name=1024MiB.sorted.mtx
op:write/impl:FMM/format:MatrixMarket/problem:0/p:8/iterations:1/real_time                1.26 s         0.227 s             1 bytes_per_second=876.407M/s problem_name=1024MiB.mtx
op:write/impl:FMM/format:MatrixMarket/problem:1/p:8/iterations:1/real_time                1.25 s         0.231 s             1 bytes_per_second=877.677M/s problem_name=1024MiB.sorted.mtx
op:write/impl:FMM/format:MatrixMarket(pattern)/problem:0/p:8/iterations:1/real_time      0.815 s         0.187 s             1 bytes_per_second=804.211M/s problem_name=1024MiB.mtx
op:write/impl:FMM/format:MatrixMarket(pattern)/problem:1/p:8/iterations:1/real_time      0.824 s         0.185 s             1 bytes_per_second=795.726M/s problem_name=1024MiB.sorted.mtx

10GiB file (note machine has 16GiB RAM):

----------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                                    Time             CPU   Iterations UserCounters...
----------------------------------------------------------------------------------------------------------------------------------------------
op:read/impl:FMM/format:MatrixMarket/problem:0/p:8/iterations:1/real_time                 6.72 s          3.23 s             1 bytes_per_second=1.48919G/s problem_name=10240MiB.mtx
op:write/impl:FMM/format:MatrixMarket/problem:0/p:8/iterations:1/real_time                14.7 s          2.77 s             1 bytes_per_second=746.948M/s problem_name=10240MiB.mtx
op:write/impl:FMM/format:MatrixMarket(pattern)/problem:0/p:8/iterations:1/real_time       10.0 s          1.94 s             1 bytes_per_second=653.106M/s problem_name=10240MiB.mtx

PIGO

bench_pigo:

----------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                                                      Time             CPU   Iterations UserCounters...
----------------------------------------------------------------------------------------------------------------------------------------------------------------
op:read/impl:PIGO/format:MatrixMarket/problem:0/p:8/iterations:1/real_time                                 0.351 s         0.314 s             1 bytes_per_second=2.84972G/s problem_name=1024MiB.mtx
op:read/impl:PIGO/format:MatrixMarket/problem:1/p:8/iterations:1/real_time                                 0.391 s         0.299 s             1 bytes_per_second=2.55914G/s problem_name=1024MiB.sorted.mtx
op:write/impl:PIGO/format:binary/problem:0/p:8/iterations:1/real_time                                      0.922 s         0.356 s             1 bytes_per_second=1066.59M/s problem_name=1024MiB.mtx
op:write/impl:PIGO/format:binary/problem:1/p:8/iterations:1/real_time                                      0.718 s         0.306 s             1 bytes_per_second=1.33625G/s problem_name=1024MiB.sorted.mtx
op:write/impl:PIGO/format:ASCII(MatrixMarket_body_only)/problem:0/p:8/iterations:1/real_time                16.4 s          14.8 s             1 bytes_per_second=62.5738M/s problem_name=1024MiB.mtx
op:write/impl:PIGO/format:ASCII(MatrixMarket_body_only)/problem:1/p:8/iterations:1/real_time                16.4 s          14.8 s             1 bytes_per_second=62.4265M/s problem_name=1024MiB.sorted.mtx
op:write/impl:PIGO/format:ASCII(MatrixMarket_body_only(pattern))/problem:0/p:8/iterations:1/real_time      0.604 s         0.316 s             1 bytes_per_second=1085.65M/s problem_name=1024MiB.mtx
op:write/impl:PIGO/format:ASCII(MatrixMarket_body_only(pattern))/problem:1/p:8/iterations:1/real_time      0.587 s         0.337 s             1 bytes_per_second=1116.13M/s problem_name=1024MiB.sorted.mtx

10GiB file (note machine has 16GiB RAM):

----------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                                                      Time             CPU   Iterations UserCounters...
----------------------------------------------------------------------------------------------------------------------------------------------------------------
op:read/impl:PIGO/format:MatrixMarket/problem:0/p:8/iterations:1/real_time                                  21.5 s          4.62 s             1 bytes_per_second=475.754M/s problem_name=10240MiB.mtx
op:write/impl:PIGO/format:binary/problem:0/p:8/iterations:1/real_time                                       57.5 s          10.7 s             1 bytes_per_second=170.985M/s problem_name=10240MiB.mtx
op:write/impl:PIGO/format:ASCII(MatrixMarket_body_only)/problem:0/p:8/iterations:1/real_time                 206 s           141 s             1 bytes_per_second=49.6292M/s problem_name=10240MiB.mtx
op:write/impl:PIGO/format:ASCII(MatrixMarket_body_only(pattern))/problem:0/p:8/iterations:1/real_time       44.0 s          8.20 s             1 bytes_per_second=148.965M/s problem_name=10240MiB.mtx

GraphBLAS

Reads include matrix construction time

bench_graphblas_fmm:

-----------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                                     Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------------------------------------------------------------------------
op:read/impl:GraphBLAS_FMM/format:MatrixMarket/problem:0/p:8/iterations:1/real_time        5.40 s          5.14 s             1 bytes_per_second=189.543M/s problem_name=1024MiB.mtx
op:read/impl:GraphBLAS_FMM/format:MatrixMarket/problem:1/p:8/iterations:1/real_time       0.925 s         0.676 s             1 bytes_per_second=1106.64M/s problem_name=1024MiB.sorted.mtx
op:write/impl:GraphBLAS_FMM/format:MatrixMarket/problem:0/p:8/iterations:1/real_time       1.27 s         0.200 s             1 bytes_per_second=864.388M/s problem_name=1024MiB.mtx
op:write/impl:GraphBLAS_FMM/format:MatrixMarket/problem:1/p:8/iterations:1/real_time       1.16 s         0.206 s             1 bytes_per_second=951.295M/s problem_name=1024MiB.sorted.mtx

bench_lagraph:

-----------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                               Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------------------------------------------------------------------
op:read/impl:LAGraph/format:MatrixMarket/problem:0/p:8/iterations:1/real_time        18.4 s          18.1 s             1 bytes_per_second=55.5359M/s problem_name=1024MiB.mtx
op:read/impl:LAGraph/format:MatrixMarket/problem:1/p:8/iterations:1/real_time        12.0 s          12.0 s             1 bytes_per_second=85.0187M/s problem_name=1024MiB.sorted.mtx
op:write/impl:LAGraph/format:MatrixMarket/problem:0/p:8/iterations:1/real_time       26.1 s          25.5 s             1 bytes_per_second=37.6224M/s problem_name=1024MiB.mtx
op:write/impl:LAGraph/format:MatrixMarket/problem:1/p:8/iterations:1/real_time       26.3 s          25.4 s             1 bytes_per_second=37.3481M/s problem_name=1024MiB.sorted.mtx

Eigen

Reads include matrix construction time

bench_eigen:

---------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                             Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------------------------------------------------
op:read/impl:Eigen/format:MatrixMarket/problem:0/p:8/iterations:1/real_time        14.0 s          13.8 s             1 bytes_per_second=73.1673M/s problem_name=1024MiB.mtx
op:read/impl:Eigen/format:MatrixMarket/problem:1/p:8/iterations:1/real_time        11.7 s          11.6 s             1 bytes_per_second=87.3179M/s problem_name=1024MiB.sorted.mtx
op:write/impl:Eigen/format:MatrixMarket/problem:0/p:8/iterations:1/real_time       24.6 s          24.2 s             1 bytes_per_second=66.6896M/s problem_name=1024MiB.mtx
op:write/impl:Eigen/format:MatrixMarket/problem:1/p:8/iterations:1/real_time       24.4 s          24.2 s             1 bytes_per_second=67.1702M/s problem_name=1024MiB.sorted.mtx

bench_eigen_fmm:

-------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                                 Time             CPU   Iterations UserCounters...
-------------------------------------------------------------------------------------------------------------------------------------------
op:read/impl:Eigen_FMM/format:MatrixMarket/problem:0/p:8/iterations:1/real_time        2.58 s          2.37 s             1 bytes_per_second=396.515M/s problem_name=1024MiB.mtx
op:read/impl:Eigen_FMM/format:MatrixMarket/problem:1/p:8/iterations:1/real_time        1.83 s          1.61 s             1 bytes_per_second=558.58M/s problem_name=1024MiB.sorted.mtx
op:write/impl:Eigen_FMM/format:MatrixMarket/problem:0/p:8/iterations:1/real_time       1.36 s          1.18 s             1 bytes_per_second=808.776M/s problem_name=1024MiB.mtx
op:write/impl:Eigen_FMM/format:MatrixMarket/problem:1/p:8/iterations:1/real_time       1.35 s          1.19 s             1 bytes_per_second=816.8M/s problem_name=1024MiB.sorted.mtx

Polars

python bench_polars.py

-----------------------------------------------------------------------------------------------------------------------
Benchmark                                                             Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------------------------------------------------
op:read/impl:Polars/format:Parquet/0/iterations:1/real_time       0.272 s         0.003 s             1 1024MiB.mtx=0 MM_equivalent_bytes_per_second=3.94835G/s bytes_per_second=1.97559G/s
op:read/impl:Polars/format:Parquet/1/iterations:1/real_time       0.208 s         0.002 s             1 1024MiB.sorted.mtx=1 MM_equivalent_bytes_per_second=5.16654G/s bytes_per_second=1.99667G/s
op:write/impl:Polars/format:Parquet/0/iterations:1/real_time       2.75 s          2.73 s             1 1024MiB.mtx=0 MM_equivalent_bytes_per_second=390.834M/s bytes_per_second=200.25M/s
op:write/impl:Polars/format:Parquet/1/iterations:1/real_time       2.70 s          2.69 s             1 1024MiB.sorted.mtx=1 MM_equivalent_bytes_per_second=398.328M/s bytes_per_second=157.633M/s

10GiB file (note machine has 16GiB RAM):

-----------------------------------------------------------------------------------------------------------------------
Benchmark                                                             Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------------------------------------------------
op:read/impl:Polars/format:Parquet/0/iterations:1/real_time        27.7 s         0.053 s             1 10240MiB.mtx=0 MM_equivalent_bytes_per_second=387.431M/s bytes_per_second=198.52M/s
op:write/impl:Polars/format:Parquet/0/iterations:1/real_time       37.3 s          30.0 s             1 10240MiB.mtx=0 MM_equivalent_bytes_per_second=287.603M/s bytes_per_second=147.368M/s