Closed dlohmeier closed 5 years ago
(1) good idea
(2) CPython has a recursion deph limit (default 1000?). A recursive implementation consequently breaks at graphs with larger components. So we should avoid recursion here.
Moreover, appending and popping elements at the stack list should not be the bottleneck. Indeed, I tried to use a deque instead (which has O(1) access to first and last element) and it didnt help a lot.
However, I reimplemented connected_component without using exceptions. This implementation is on-par performance wise with your recusive approach.
I will integrate it soon.
I opened a pull request #267 to implement the changes with the for-loop and the call to mg.edges with notravbuses. I chose the deque implementation over the list, in my tests it was just about 3% faster. I had also thought about whether or not the iterator iter(mg[bus])
is really required, as networkx 2.x returns an AdjacencyView when calling mg[bus]
. However, with respect to timing I did not find any difference really. It seems like memory usage is the same with or without iter, but I am not sure, as I didn't manage to run a memory profiler properly.
Can this be closed now that #267 has been merged?
Yes, definitely :-)
I have had a look at the functions
connected_component
andconnected_components
from the topology package again and thought about some performance improvement. There were two things that crossed my mind:In the method
connected_components
a large part of time is wasted on the cornercases of having notravbuses. I suggest the following small change:I simply added the notravbuses to
mg.edges()
in the second loop. This would usually reduce the time drastically, as it searches only some of the edges. Also the if-statement could be left out then, as in case the argument is empty, an empty iterator is returned, so nothing happens.The function
connected_component
spends a large share of time in appending and reading from the created stack list. As the returned iterator cannot be expected to take a specific order, it might make sense to use some set functions. I have implemented a recursive method that collects visited components in a set. It is this way slightly faster thanconnected_component
for the cases that I tested it on.So testing the timings delivers the following results. I always compare the initial implementation with the improvement of adding notravbuses as suggested in (1.) and then using the recursive
find_connected
method.So in case notravbuses are given, the first suggestion decreases the timing from 571 to 446 µs, otherwise it is roughly the same. The recursive implementation of connected component can decrease the timing from ~440µs to ~335µs. Yet I can imagine that there are still better, clearer and maybe more performant ways to do this.
It might be worthwhile checking if this can be implemented, as it makes some difference in large scale analyses.