Open j-i-l opened 1 month ago
Regarding the tests, there is now a little helper function that allows to compare matrices with not necessarily sorted indices:
@alexbovet
Regarding the csr_
operations, would it make sense to perform an in-place sort_indices() before returning the csr_matrix
?
@alexbovet Regarding the
csr_
operations, would it make sense to perform an in-place sort_indices() before returning thecsr_matrix
?
Is it necessary?
@alexbovet Regarding the
csr_
operations, would it make sense to perform an in-place sort_indices() before returning thecsr_matrix
?Is it necessary?
Unsure. It could matter, depending on how operations are implemented. If it is looped over the indices and some </>
are used, then we might skip some indices.
This will, in general, depend on the implementation, I guess.
Having just a quick look at our own csr_
operations we use something a bit along these lines in the cython_csr_csrT_matmul
function:
@alexbovet OK in our implementations it can matter!: If we take the example data from the comment:
import numpy as np
from scipy.sparse import csr_matrix
from flowstab.SparseStochMat import csr_csrT_matmul
n_data = np.array([0.16911086, 0.26525225, 0.04051438, 0.09085281, 0.55254878, 0.82202173,
0.51055564, 0.50439578, 0.85289212, 0.74480397]) # native
data = np.array([0.16911086, 0.26525225, 0.04051438, 0.09085281, 0.55254878, 0.82202173,
0.51055564, 0.85289212, 0.50439578, 0.74480397]) # csr_add
# indicies
n_indices = np.array([0, 1, 1, 2, 3, 3, 4, 5, 7, 9]) # native
indices = np.array([0, 1, 1, 2, 3, 3, 4, 7, 5, 9]) # csr_add
# indptr
n_indptr = np.array([ 0, 0, 1, 2, 2, 4, 5, 5, 7, 10, 10]) # native
indptr = np.array([ 0, 0, 1, 2, 2, 4, 5, 5, 7, 10, 10]) # csr_add
An = csr_matrix((n_data, n_indices, n_indptr))
A = csr_matrix((data, indices, indptr))
np.testing.assert_equal(An.toarray(), A.toarray()) # make sure it's the same array repr.
so we have A
as unsorted output and An
as sorted, and use csr_csrT_matmul
:
In [22]: np.testing.assert_equal(csr_csrT_matmul(A, A).toarray(), csr_csrT_matmul(An, An).toarray())
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
Cell In[22], line 1
----> 1 np.testing.assert_equal(csr_csrT_matmul(A, A).toarray(), csr_csrT_matmul(An, An).toarray())
...
AssertionError:
Arrays are not equal
Mismatched elements: 1 / 100 (1%)
Max absolute difference among violations: 0.2544151
Max relative difference among violations: 0.16557306
ACTUAL: array([[0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. ],
[0. , 0.028598, 0. , 0. , 0. , 0. ,...
DESIRED: array([[0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. ],
[0. , 0.028598, 0. , 0. , 0. , 0. ,...
If we sort the indices of A
things are OK again:
In [23]: A.sort_indices()
In [23]: np.testing.assert_equal(csr_csrT_matmul(A, A).toarray(), csr_csrT_matmul(An, An).toarray())
In [24]:
Unless I'm doing something illogical here, it seems to me that we should sort the csr matrices before returning them.
Checking if the input matrices have sorted indices could be a good thing.
Okay, I see. It matters for the Cython code I wrote. But I guess the scipy functions take into account that indices may not be sorted. In the end, I don't think I used my Cython implementation of the matrices operations for stoch_mat. I always converted to scipy csr to multiply or add them as it was faster. But anyway, it would be a good idea to force a sorting of the indices before calling any cython functions.
It is expected that the indices may not be sorted after some operations. csr matrices have a
has_sorted_indices
attribute. If you do asort_indices()
andeliminate_zeros()
before thenp.testing.assert_allclose()
, it should hopefully fix the problem._Originally posted by @alexbovet in https://github.com/alexbovet/flow_stability/issues/35#issuecomment-2353056512_