JuliaSparse / SuiteSparseGraphBLAS.jl

Sparse, General Linear Algebra for Graphs!
MIT License
102 stars 17 forks source link

Vector dot is much slower than build-in operation #69

Open learning-chip opened 2 years ago

learning-chip commented 2 years ago

I can get decent parallel speed-up for sparse matmul and sparse matvec, but the dot product between two vectors seems very slow:

using SuiteSparseGraphBLAS
using BenchmarkTools

gbset(:nthreads, 16)

b = ones(10000)
b_gb = GBVector(b)

@btime b' * b  #  1 μs
@btime b_gb' * b_gb  # 15 μs

Is this expected? Or it can be tuned to be faster?

Version: SuiteSparseGraphBLAS@0.7.0

rayegun commented 2 years ago

I do see this behavior (although more like 10x on my device). The big thing is that SuiteSparse:GraphBLAS is not a replacement for BLAS1 operations. It's a sparse matrix library, so it will always be a bit slow for simple BLAS operations.

That being said we can probably do better here. Perhaps by unpacking and repacking the result and actually doing BLAS1. For the basic arithmetic semiring.

We could also not be at O3 for some reason, I'll check on that. As well as talk to Tim Davis.