Open jovo opened 9 years ago
If we only remove 1-degree and 2-degree nodes, we probably can't remove many vertices from a graph. For example, there are 1,108,979,537 vertices that has degree smaller than 10 in the page graph. Even if we delete all vertices whose degree is smaller than 10, we can't speed up spectral clustering on the remaining vertices by much.
ok, so maybe that is a bad idea....i imagine some of the others are less bad....
On Wed, Dec 10, 2014 at 7:53 PM, Da Zheng notifications@github.com wrote:
If we only remove 1-degree and 2-degree nodes, we probably can't remove many vertices from a graph. For example, there are 1,108,979,537 vertices that has degree smaller than 10 in the page graph. Even if we delete all vertices whose degree is smaller than 10, we can't speed up spectral clustering on the remaining vertices by much.
— Reply to this email directly or view it on GitHub https://github.com/icoming/FlashGraph/issues/58#issuecomment-66552601.
it's our dream openconnecto.me, jovo.me, my calendar https://www.google.com/calendar/embed?src=joshuav%40gmail.com&ctz=America/New_York
According to "Accelerating Community Detection by Using K-core Subgraphs", the method you described is pretty effective, but I guess its effectiveness should vary from graphs to graphs. Nevertheless, we need to use much larger K. For a graph with smaller than 1M vertices, K=10 might be enough. For a graph with billions of vertices, we might need to use K=100. For example, 3,128,386,131 vertices in the page graph has degree smaller than 100. The question is how well the method can work when K is as large as 100.
I have assigned Louvain and Nerstrand (the third clustering algorithm) to @dmhembe1 . I hope we can have at least one of them. It might be very difficult to debug and verify the code for these two implementations. I'm not sure how useful the fourth clustering algorithm is. It always partitions a graph into two parts.
sounds good.
I was mistaken. The vertex degree is different from k-core. We can't tell how many vertices in 1-core, 2-core, etc from vertex degree. Let's do some experiments before reaching a conclusion.
some potential options:
-- Near linear time algorithm to detect community structures in large-scale networks -- louvain -- Multi-Threaded Modularity Based Graph Clustering using the Multilevel Paradigm -- A Local Clustering Algorithm for Massive Graphs and its Application to Nearly-Linear Time Graph Partitioning
@icoming has mentioned some other potentially worth implementing. let's populate a list of potentials, and see whether we want to implement any of them?
note that i could imagine something like the following working very well: (1) find k-core for some reasonable k (eg, remove all 1-degree & 2-degree nodes) (2) augment the adjacency matrix by adding something along the diagonal (3) random projection of resulting "adjacency" matrix (4) k-means on the result
might be competitive. note that (3)+(4) is a kind of randomized spectral clustering. i bet we could prove that (1)+(2)+(3)+(4) is optimal (and robust) with high probability under a (degree-corrected) SBM...