AnacletoLAB / grape

🍇 GRAPE is a Rust/Python Graph Representation Learning library for Predictions and Evaluations
MIT License
502 stars 38 forks source link

missing `get_minimum_spanning_tree` and similar #57

Open rtbs-dev opened 7 months ago

rtbs-dev commented 7 months ago

Very impressive library so far. Just wanted to mention here, unless I'm misreading your API docs, that the Graph object doesn't have an implementation of Prim's or Kruskal's minimum/maximum spanning tree. This is the last thing keeping me on e.g. scipy.sparse.csgraph, and was the first thing I looked for here.

Ideally, I would imagine a slightly more useful MST interface that e.g. defaults to the spanning tree for the whole graph, but could accept an array of node activation flags and an (optional) cost matrix to calculate the MST on that induced subgraph. This is part of a simple way to approximate the steiner tree on those nodes, for instance. If the user doesn't supply a cost matrix, then the metric closure would work (again, if desired...MST on the original graph weights is probably the default).

I did find these, but a number expressly say the tree is not minimal:

LucaCappelletti94 commented 7 months ago

What do you mean by but a number expressly say the tree is not minimal?

rtbs-dev commented 7 months ago

I mean in the docs. From the second one:

Returns consistent spanning arborescence using Kruskal. The spanning tree is NOT minimal.

From the third:

The spanning tree is NOT minimal. The given random_state is NOT the root of the tree.

And the first seems to look like the second, and never specifies if the arborescence is minimal over the provided edge weights.

LucaCappelletti94 commented 7 months ago

All of these methods will return you the arborescences, which of course have a minimal number of edges. I don't recall whether I implemented one for the weighted case, as I don't have ever needed one. Do you know any good algorithm that scales well?

rtbs-dev commented 7 months ago

Sure; for starters, scipy implements minimum_spanning_tree (in fact, everything in the scipy.sparse.csgraph module would be a great thing to include here!)

The source code there has a reference implementation of Kruskal's algorithm (in a weighted setting).

The other option (outside of networkX's many implementations, one of which is Boruvka's algorithm) is graphblas, which would be very fast if done on the matrix, directly, but I can only find a version of Prim's algorithm in a C++ template repo...nothing for python-graphblas.

rtbs-dev commented 7 months ago

Note that these are all essentially O(|E|log|V|), so they are considered quite fast already. I think there's an expected-linear-time one, as well, e.g. here. But that would probably be more work than it's worth.

My use-case is typically finding MSTs in a metric closure, so Prim's algorithm runs faster (on dens graphs).