Qiskit / rustworkx

A high performance Python graph library implemented in Rust.
https://www.rustworkx.org
Apache License 2.0
1.05k stars 146 forks source link

Return sparse matrix instead of dense matrix from adjacency matrix functions #1081

Open sambaPython24 opened 7 months ago

sambaPython24 commented 7 months ago

networkx gives adjacency matrices as sparse matrices and that is necessary, since for larger graphs with >100000 nodes with sparse connections, we can have a sparse e.g. coo matrix but a dense matrix does usually not fit into the memory.

I suggest, you give the option to return either a sparse (coo) matrix or its values and coordinates (as an option).


Comparison networkx

IvanIsCoding commented 7 months ago

I think converting the graphs into a sparse matrix would be really helpful. But on the Rust side https://github.com/sparsemat/sprs/issues/289 is going to be a big blocker.

We could also try implementing types that are a read-only CSC/CSR matrix from Rust, but I am not sure how useful that is for users

mtreinish commented 7 months ago

At least in this case since we just need to generate a sparse matrix. I think this one isn't actually super hard, albeit not as potentially optimized as possible. We should be able construct 1d arrays for each of the inputs like (row, col, data) or (indptr, indices, data) as readonly numpy arrays directly from rust fairly efficiently by iterating over the graph and then it's just a matter of calling the appropriate scipy python side sparse matrix constructor with those arrays we generate.

The thing I'm mostly hesitant about is adding a dependency on scipy to the package, as it's fairly heavy and we're not actually going to be using it for anything inside rustworkx, just for interop with sparse matrices which is a fairly small part of scipy. I wonder if we had a function like rustworkx.sparse_adjacency_matrix(graph: PyGraph) -> (PyArray, PyArray, PyArray) or something like that if it would be enough. Then as an end user you would call:

scipy.sparse.csc_matrix(rustworkx.sparse_adjacency_matrix(graph))

Although if we did add a scipy dependency (including an optional one) we can probably do this call directly in rust via pyo3 fairly easily.

sambaPython24 commented 7 months ago

@mtreinish, @IvanIsCoding Thank you for your fast response! From the user side, it would be certainly enough to just give out (row, col, data) as a dict or a tuple, since it is very easy to construct sparse matrices or transform there layout in scipy or other, even more optimized python libraries.

Unless you do not want to re-write scipy and its sparse matrix operations in rust as well (which would be amazing but probably a bit of an overkill), adding functionality or its own sparse format does not seem to be necessary.

The implemention of the COO matrix is probably just a loop over the edges and instead of filling the dense matrix, appending it to a list. The reason, I did not do that directly in python is, that in Rust it must be a lot faster which matters for the mentioned, large matrices.