dice-group / Lemming

LEMMING is an ExaMple MImickiNg graph Generator
GNU Affero General Public License v3.0
4 stars 6 forks source link

Improve speed of triangle counting #2

Open MichaelRoeder opened 7 years ago

MichaelRoeder commented 7 years ago

Task description

Lemming already implements the counting of triangles in a given graph as a possible metric. At the moment, the implemented algorithm iterates over the vertices of the graph and tries to find all triangles in which the current vertex is part of. This is done by iterating over the edges that are starting or ending at the current vertex (the direction of edges is not interesting for counting triangles) and checking whether the other two vertices two edges have are connected via a third edge. The algorithm does not take triangles into account that involve a vertex that has been seen before (i.e., that has a lower ID than the current vertex) to make sure that every triangle is counted only once. The implementation can be found at https://github.com/dice-group/Lemming/blob/master/src/main/java/org/aksw/simba/lemming/metrics/single/NumberOfTrianglesMetric.java#L41-L75.

Although the implementation is already parallelizing the problem, its complexity grows too fast for dense graphs. We would like to have additional implementations for counting triangles and a logic, that chooses a triangle counting approached based on the input graph (e.g., based on its density).

The project is mainly based on the Grph library.

Possible approaches

http://i11www.iti.kit.edu/extra/publications/sw-fclt-05_t.pdf gives an overview of different approaches. Our current approach is the edge-count and forward is an enhancement of this approach.

  1. forward from http://i11www.iti.kit.edu/extra/publications/sw-fclt-05_t.pdf
  2. matrix-multiplication is a different approach. It is based on the adjacency matrix A of the given graph and follows an algorithm like http://www.geeksforgeeks.org/number-of-triangles-in-a-undirected-graph/. Unfortunately, the matrix might be very big and space limitations might become a problem for large graphs.
  3. ayz might be a very good choice. However, for this approach the matrix-multiplication has to be implemented at first.
  4. In https://www.cs.cmu.edu/~glmiller/Publications/Papers/Tsourakakis-KMF-09.pdf the authors propose a sampling based on that the triangles can be estimated faster without loosing too much accuracy. This would fit to our already existing implementation.

Testing

Apart from testing the correctness (with some delta for approaches that do not reach 100% correctness, the simple JUnit test class could be rused), the runtime have been checked as well. When commenting out all other metrics, only adding the triangle counting and executing the RefinementTest class it should be finished in a reasonable amount of time. For testing, the semantic web dog food data is necessary. It is provided at http://hobbitdata.informatik.uni-leipzig.de/lemming/SemanticWebDogFood-until-2015.zip

Forking / Merging

The solution should be implemented in a branch, forked from the master branch of the project. The pull request should merge the solution back into the master branch.

Responsible contact

Michael Röder

MichaHoffmann commented 7 years ago

Check out DUOLION paper! Seems to fit very well here

Am 18.09.2017 2:56 nachm. schrieb "Michael Röder" <notifications@github.com

:

Task description

Lemming already implements the counting of triangles in a given graph as a possible metric. At the moment, the implemented algorithm iterates over the vertices of the graph and tries to find all triangles in which the current vertex is part of. This is done by iterating over the edges that are starting or ending at the current vertex (the direction of edges is not interesting for counting triangles) and checking whether the other two vertices two edges have are connected via a third edge. The algorithm does not take triangles into account that involve a vertex that has been seen before (i.e., that has a lower ID than the current vertex) to make sure that every triangle is counted only once. The implementation can be found at https://github.com/dice-group/Lemming/blob/master/src/main/ java/org/aksw/simba/lemming/metrics/single/NumberOfTrianglesMetric.java# L41-L75.

Although the implementation is already parallelizing the problem, its complexity grows too fast for dense graphs. We would like to have additional implementations for counting triangles and a logic, that chooses a triangle counting approached based on the input graph (e.g., based on its density). Possible approaches

http://i11www.iti.kit.edu/extra/publications/sw-fclt-05_t.pdf gives an overview of different approaches. Our current approach is the edge-count and forward is an enhancement of this approach.

  1. forward from http://i11www.iti.kit.edu/ extra/publications/sw-fclt-05_t.pdf http://i11www.iti.kit.edu/extra/publications/sw-fclt-05_t.pdf
  2. matrix-multiplication is a different approach. It is based on the adjacency matrix A of the given graph and follows an algorithm like http://www.geeksforgeeks.org/number-of-triangles-in-a-undirected-graph/ http://www.geeksforgeeks.org/number-of-triangles-in-a-undirected-graph/. Unfortunately, the matrix might be very big and space limitations might become a problem for large graphs.
  3. ayz might be a very good choice. However, for this approach the matrix-multiplication has to be implemented at first.
  4. In https://www.cs.cmu.edu/~glmiller/Publications/Papers/ Tsourakakis-KMF-09.pdf the authors propose a sampling based on that the triangles can be estimated faster without loosing too much accuracy. This would fit to our already existing implementation.

Testing

Apart from testing the correctness (with some delta for approaches that do not reach 100% correctness, the simple JUnit test class https://github.com/dice-group/Lemming/blob/master/src/test/java/org/aksw/simba/lemming/metrics/single/NumberOfTrianglesMetricTest.java could be rused), the runtime have been checked as well. When commenting out all other metrics https://github.com/dice-group/Lemming/blob/master/src/main/java/org/aksw/simba/lemming/tools/RefinementTest.java#L53-L59, only adding the triangle counting and executing the RefinementTest https://github.com/dice-group/Lemming/blob/master/src/main/java/org/aksw/simba/lemming/tools/RefinementTest.java class it should be finished in a reasonable amount of time. Forking / Merging

The solution should be implemented in a branch, forked from the master branch of the project. The pull request should merge the solution back into the master branch. Responsible contact

Michael Röder http://cs.uni-paderborn.de/ds/team/team/

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dice-group/Lemming/issues/2, or mute the thread https://github.com/notifications/unsubscribe-auth/AQpKf8KFmQ68vVUJQxdrwVAsHG2D0trWks5sjmhpgaJpZM4Pa3U3 .

MichaelRoeder commented 7 years ago

That is the fourth point in the description :wink: But thanks for the hint :smiley: