alexbovet / flow_stability

Flow stability framework for dynamic community detection in temporal networks
GNU Lesser General Public License v3.0
24 stars 4 forks source link

Type declaration for `indices` and `indptr` might lead to dtype mismatch #43

Open j-i-l opened 3 months ago

j-i-l commented 3 months ago

When passing the arguments of a scipy.sparse.csr_matrix to sparse_stoch_from_full_csr we run into a ValueError:

long long[:] Tf_indices, E ValueError: Buffer dtype mismatch, expected 'long long' but got 'int'

_See e.g. here_

We can address this by explicitly converting the indices to np.int64 when passing them as arguments, or we might change datat type to int instead of long long in sparse_stochfrom_full_csr (and others).

Note that for some functions in __cython_sparsestoch.pyx already declare int[:] as data type when indices are passed as arguments.


          We have some issues with the type declarations for `indices` and `indptr`:

https://github.com/alexbovet/flow_stability/actions/runs/10513029536/job/29127746857#step:5:116

It also seems not completely consistent in the defs:

E.g. for indptr some functions require int:

https://github.com/alexbovet/flow_stability/blob/8648a8bd715e3b086ed13cb7453d41754c176398/src/flowstab/_cython_sparse_stoch.pyx#L859-L863

some long long: https://github.com/alexbovet/flow_stability/blob/8648a8bd715e3b086ed13cb7453d41754c176398/src/flowstab/_cython_sparse_stoch.pyx#L613-L618

I'd put everything indices and indptr to long and declare with np.int64 on the python site. @alexbovet is that OK for you or is there actually a use-case for long long?

_Originally posted by @j-i-l in https://github.com/alexbovet/flow_stability/issues/35#issuecomment-2305337823_

j-i-l commented 3 months ago

https://github.com/alexbovet/flow_stability/pull/35#issuecomment-2306467560

We might want to add the possibility to install flowstab in "large" data mode where we consistently use long long for all index arrays.

alexbovet commented 3 months ago

Yes, that could be a good option.

I remember there was an issue with the connected_components function from scipy.csgraph which cannot use int64 (here. Something to keep in mind.

Also, I think this is a discussion they have at scipy, and other projects.

j-i-l commented 2 months ago

In C++ we could simply use template functions which would allow us not having to specify the type of indices and indptr.

I'm unsure if/how this can be done in cython...

Actually, scipy as nice examples of this approach in their sparsetools:

https://github.com/scipy/scipy/blob/9e93f673c4021cc6dc38487dbe8fb2db8cadf131/scipy/sparse/sparsetools/csr.h#L103-L116

That might be a good approach for us: We would only need to set the type in the numpy representation, i.e. np.int32 or np.int64 and the rest would follow... well that would be the idea.