Torrintino / networkit

NetworKit is a growing open-source toolkit for large-scale network analysis.
https://networkit.iti.kit.edu/
MIT License
0 stars 0 forks source link

Rewrite for parallel implementation #6

Open Torrintino opened 5 years ago

Torrintino commented 5 years ago
  1. Write or find function for partitioning the Graph
  2. Rewrite the algorithm to partition the Graphs and then runs ReduceToKTruss in parallel on each partition
Torrintino commented 5 years ago

This seems to implement, what we are looking for: https://networkit.github.io/dev-docs/cppdoc.html#

Torrintino commented 5 years ago

The paper can be found here: http://langevin.univ-tln.fr/cours/PAA/extra/Tarjan-1972.pdf If I don't misunderstand, the complexity should be linear in O(E + N)

Torrintino commented 5 years ago

Since our algorithm is already working, I will start a new branch for the parallel implementation

Torrintino commented 5 years ago

@niklasdeckers I wonder, whether doing the computation in parallel will actually be faster.

Our plan was to call the function ReduceToKTruss(Graph g, count k) for each component. Since we computed the data structures from scratch on for every call, we only would have to split the graph into components. Now my latest change, uses the same data structures across all calls to ReduceToKTruss. That saves a large amount of time, however if we then do the calls in parallel, we will also need to split the queue structure. I think this should be doable in O(|E|), which doesn't theoretically increase the complexity, but in practice we have O(|E| * 4) more operations, which generates a lot of overhead. I think the parallelism might only speed things up on a machine with a lot of CPU cores and a Graph with many small partitions.

What is your opinion on this? We now have an algorithm that is working, tested and takes around half a second for 150 nodes and an edge probability of 0.4 (about 2000 Edges) on my laptop. Are we fine with this, or do we want to implement parallelism? If not, we can still outline our design of the parallel algorithm in the presentation.

niklasdeckers commented 5 years ago

Will work this out now. It might be enough to make a parallelization approach that is only efficient on certain types of graphs (especially the typical ones).

niklasdeckers commented 5 years ago

So it seems to me that extracting biconnected components is not very effective on most "natural" graphs as there might be large areas (that would be very interesting for parallelism) that are connected with very few (but more than one) edges. This means that something like http://weiemmazhang.me/sublinks/papers/4.2.2013SIGMOD.Efficiently%20computing%20k-edge%20connected%20components%20via%20graph%20decomposition.pdf would be the way to go. Nevertheless, this seems not to be implemented in Networkit. I will try to point out the difference empirically using the existing graphs for tomorrow.

The data handling problem remains and we might only gain efficiency when the splitting effort is justified by the graph structure. Will do a few calculations on existing natural graphs - probably we do not really have to implement parallelism.