While running Linearlite multi-project, I noticed unsubscribes took a very long time, and I found that the Graph.delete_vertices operation is what takes the most time to run by far, not scaling very well.
It doesn't make sense to me that deleting a bunch of vertices from a tree graph would take this long especially when traversing the whole graph is fast, so reconstructing it should really not be too much trouble (?), so my conjecture is that the delete_vertices implementation which interacts with a map quite a lot is just suboptimal.
The proposed change is a minimal diff to do the same thing which with the same workload I'm observing >100x improvement in speed with much better scaling properties, but obviously when you get this kind of speedup it casts doubt on its correctness.
Instead of deleting vertices, we keep track of all edges to be removed, as well as vertices to be removed - then run Graph.remove_edge which seems to have a fairly optimal implementation for all edges to be popped, and instead of removing the vertices with Graph.delete_vertices we use Graph.subgraph with a diff of the graph's existing vertices minus the ones to be removed, which should generate a maximally connected subgraph containing the specified vertices.
Since the request related edges have been removed, along with the disconnected vertices, the subgraph should be (?) the same as what we were producing before, but I suspect because of the implementation of Graph.subgraph being more aimed at recreating a graph from scratch through traversing the existing one rather than modifying the existing one it is more efficient. edit: I suspect also that since once a vertex that forms a subtree that is going to be removed entirely is marked as removed, constructing a subgraph ignores the whole subtree whereas delete_vertices will try to clean up the subtree that we're going to entirely ignore anyway (?)
Even if there is an issue with the correctness of this approach, I believe we should be able to traverse the graph from the root to recreate a graph without the specified edges and vertices much faster than with Graph.delete_vertices, even if it requires a more custom implementation.
Testing with Linearlite varying number of issues and comments, split over 3 projects/shapes.
The unsubscribe Shapes.SentRowsGraph.pop_by_request_ids call takes:
Addresses VAX-1977
While running Linearlite multi-project, I noticed unsubscribes took a very long time, and I found that the
Graph.delete_vertices
operation is what takes the most time to run by far, not scaling very well.It doesn't make sense to me that deleting a bunch of vertices from a tree graph would take this long especially when traversing the whole graph is fast, so reconstructing it should really not be too much trouble (?), so my conjecture is that the
delete_vertices
implementation which interacts with a map quite a lot is just suboptimal.The proposed change is a minimal diff to do the same thing which with the same workload I'm observing >100x improvement in speed with much better scaling properties, but obviously when you get this kind of speedup it casts doubt on its correctness.
Instead of deleting vertices, we keep track of all edges to be removed, as well as vertices to be removed - then run
Graph.remove_edge
which seems to have a fairly optimal implementation for all edges to be popped, and instead of removing the vertices withGraph.delete_vertices
we useGraph.subgraph
with a diff of the graph's existing vertices minus the ones to be removed, which should generate a maximally connected subgraph containing the specified vertices.Since the request related edges have been removed, along with the disconnected vertices, the subgraph should be (?) the same as what we were producing before, but I suspect because of the implementation of
Graph.subgraph
being more aimed at recreating a graph from scratch through traversing the existing one rather than modifying the existing one it is more efficient. edit: I suspect also that since once a vertex that forms a subtree that is going to be removed entirely is marked as removed, constructing a subgraph ignores the whole subtree whereasdelete_vertices
will try to clean up the subtree that we're going to entirely ignore anyway (?)Even if there is an issue with the correctness of this approach, I believe we should be able to traverse the graph from the root to recreate a graph without the specified edges and vertices much faster than with
Graph.delete_vertices
, even if it requires a more custom implementation.Testing with Linearlite varying number of issues and comments, split over 3 projects/shapes. The unsubscribe
Shapes.SentRowsGraph.pop_by_request_ids
call takes:For 3k rows / shape (single shape)
For 10k rows / shape (single shape)
For 20k rows / shape (multi-shape)
Old
New