JuliaSparse / SuiteSparseGraphBLAS.jl

Sparse, General Linear Algebra for Graphs!
MIT License
102 stars 17 forks source link

How to set the number of parallel threads? #66

Open learning-chip opened 2 years ago

learning-chip commented 2 years ago

The benchmark section shows faster execution with more threads. However, I cannot reproduce such parallel scaling.

The benchmark script:

using Random: seed!
using SparseArrays
using SuiteSparseGraphBLAS
using BenchmarkTools

@show Sys.CPU_THREADS
@show get(ENV, "OMP_NUM_THREADS", nothing)
@show get(ENV, "MKL_NUM_THREADS", nothing)
@show get(ENV, "OPENBLAS_NUM_THREADS", nothing)
@show get(ENV, "JULIA_NUM_THREADS", nothing)

seed!(0)
A = sprand(Float64, 10000, 10000, 0.05)
B = sprand(Float64, 10000, 1000, 0.1)

# @btime A * B

A_gb = GBMatrix(A)
B_gb = GBMatrix(B)

@btime A_gb * B_gb

Run with:

export OMP_NUM_THREADS=1
export MKL_NUM_THREADS=1
export OPENBLAS_NUM_THREADS=1
export JULIA_NUM_THREADS=1
julia ./graphblas_timing

Then, by changing any of the *_NUM_THREADS variables, the execution time always stays the same (~180ms on my 112-core machine). It is 7x faster than the built-in sparse matmul (~1.2s). However I don't really know how many threads it is using, and doesn't seem to be able to change it.

rayegun commented 2 years ago

SuiteSparseGraphBLAS.gbset(:nthreads, <NUMTHREADS>). You can get the current number using SuiteSparseGraphBLAS.gbget(:nthreads).

This interface needs to be both improved and better documented, sorry about that. This is what it currently does: gbset(:nthreads, Sys.CPU_THREADS ÷ 2) on startup. I will likely change that at some point to use one of the environment variables above.

Note that it probably isn't going to use 56 threads on a problem of that size (or if it does it's not going to be scaling well). For most of the internal kernels you can observe what's happening with gbset(:burble, true). That will make SuiteSparse:GraphBLAS print out its internal diagnostic information, which includes the number of threads used.

learning-chip commented 2 years ago

SuiteSparseGraphBLAS.gbset(:nthreads, )

This works well, thanks!

rayegun commented 2 years ago

I'm going to leave this open until I find a better interface

corbett5 commented 9 months ago

In a similar vein, do you know if multi-threading works on Apple ARM chips? Changing the number of threads with gbset has an impact on the number of threads reported by burble but it does not have an impact on the runtime or the CPU usage.