Closed deklanw closed 1 month ago
Thanks a lot for bringing this library into our attention.
matvec
was created because I failed to find this functionality, probably because I was searching for the wrong keywords. That said, if you send a wrapped scipy matrix to pygrank after enabling the matvec backend, the library should work just fine (if an error occurs, please report the stack trace and offending code because backends are interoperable for forward computations: once you input data to the library they should yield the same outputs barring numerical precision differences). Though matvec uses twice the memory because it internally copies scipy data once.
The differences in times you are reporting in simple multiplication lie more in that matvec stores matrices in coo format so it has a different parallelization strategy as a result - I think it can be even faster in these specific cases if it's made to handle csr or csc matrices, but since it's a side-project I don't know if I will get to work on this. At any rate, the format for the fastest multiplication strategy differs depending on graph characteristics (especially the average and standard deviation of node degrees).
Matvec has some memory reuse and parallelized element-by-element matrix operation optimizations, so in theory it yields sub-linear running times with respect to the number of nodes too (instead of only edges).
I am looking into creating a sparse_dot_mkl
backend. This is trivial development-wise (needs 5 lines of code), but I have little experience with setting up MKL and everybody seems to be using it out-of-the-box with conda, whereas pygrank's development and CI pipelines use pip. I will update this issue if there is progress on that front (releases are not to be published without passing both local and CI tests).
Glad it's helpful.
What about checking for the presence of MKL when the numpy backend is used and using sparse_dot_mkl
if it's present, otherwise warning that installing it could come with significant speed-ups?
After thinking about it, the inverse could be done: have the sparse_dot_mkl
backend use numpy with a warning if linking fails. For the time being, I will move in this direction.
Released version 0.2.10 with a sparse_dot_mkl
backend that switches to numpy.
It passes local tests, but, if possible, can you also confirm @deklanw that it works on your end?
Tested on Intel's Python distribution. The "sparse_dot_mkl"
backend works correctly but is similar in speed to the "numpy"
backend when numpy and scipy are compiled with mkl.
I saw that you have your own sparse matrix library called matvec which parallelizes sparse-dense multiplication. There is an existing Python library called sparsedot which does the same but with scipy csr/csc matrices https://github.com/flatironinstitute/sparse_dot.
I benchmarked the two with a matrix of size
With the 32 core/64 thread server CPU I'm testing on the times for 10 matrix-vec multiplications on the right and left are:
The times look competitive. I'm not sure if matvec has some other advantages I'm not considering here, but that sparsedot works with the existing scipy types would be a huge benefit (for my usecase, at least). Sparsedot does require installing the mkl library and for giant matrices as above requires the environment variable
MKL_INTERFACE_LAYER=ILP64
.