The best practice for sparse matrix multiplication

JiaweiZhuang commented 6 years ago

The information below is terribly outdated. See the latest comparisons at: https://github.com/JiaweiZhuang/sparse_dot

Since #2 is largely solved, I've done some preliminary tests on sparse matrix multiplication (SMM), ~~using the sparse package.~~ Just realized the basic scipy.sparse.coo_matrix.dot() is enough. I can flatten the input array to shape [Ntime*Nlev, Nlat*Nlon] so it can be passed to coo_matrix.dot().

The test input data has the shape (600, 400, 10, 50), which is then regridded to shape (400, 300, 10, 50). The timing results are

Calculating weights: ~14 s (This only needs to be done once, so I am not trying to optimize it) - Applying weights with ESMPy: 1.46 s ~~(transferring numpy data to ESMPy internal takes additional 6.6s, due to converting C-ordered array to Fortran-ordered)~~ - Applying weights with sparse.tensordot(): 579 ms

~~It is quite surprising that sparse.tensordot() is 3 times faster than ESMPy's internal Fortran routine.~~

Applying weights: (The previous comparison was not fair. I've removed all unnecessary array copying and re-ordering, and here's the new timing)

Method	Applying weights	Write output data to file
ESMPy	0.3 s ^[1] + 1 s ^[2]	2 s ^[3]
coo_matrix.dot	0.6 s	2 s
sparse.tensordot ^[4]	1 s	4 s
loop with numba	3 s	2 s
loop with pure numpy	6 s	2 s

[1] : Passing input data to Fortran internal. It also doubles memory use, which is not a problem in other methods. [2] : Actual SMM. [3] : Slow, due to the use of Docker. Would be ~0.3 s on native machine. [4] : The slow-down compared to scipy is due to memory layout (C-ordering vs Fortran-ordering). See the notebook below.

The notebooks are available at

compute_weights.ipynb: offline weights generation and ESMPy SMM timing
apply_weights.ipynb: Python SMM timing

It might be useful to have both ESMPy and scipy as different backends. ESMPy results can be used as reference. At least one non-ESMPy method is needed since ESMPy cannot reuse offline weights right now.

@bekozi If you have time (no hurry), could you run my notebooks to see whether it is true that ESMPy's SMM is slower than Python's? I want to make sure the comparison was fair and I was not using ESMPy in a wrong way.

This thread is for discussing the best practice for SMM (i.e. applying weights).

Before developing high-level APIs I want to make sure I am using the most efficient way to perform SMM. Any suggestions will be appreciated. @mrocklin @pelson

JiaweiZhuang commented 6 years ago

Because the regridding is so fast even for this relatively large data, I will postpone #3 and focus on serial implementation first.

mrocklin commented 6 years ago

It is quite surprising that sparse.tensordot() is 3 times faster than ESMPy's internal Fortran routine.

sparse.tensordot just calls scipy.sparse.csr_matrix.dot, which is written in C. It is not surprising to see it perform decently. There is some cost in converting from NDArray COO format to 2d CSR, but hopefully this is relatively small. It is also cached if used repeatedly.

mrocklin commented 6 years ago

Is it necessary for ESMPy to write outputs to file or can it pass results back as numpy arrays?

JiaweiZhuang commented 6 years ago

Is it necessary for ESMPy to write outputs to file or can it pass results back as numpy arrays?

Currently it has to write outputs to file. I don't know ESMPy internals so need to ask @bekozi

JiaweiZhuang commented 6 years ago

According to the timing tests, scipy.sparse.coo_matrix.dot should be the most efficient method for applying weights.

JiaweiZhuang / xESMF

The best practice for sparse matrix multiplication #6