Closed ollin18 closed 6 years ago
There is a problem using the Clustering.jl's kmeans algorithm for the NBM community detection because we have to cluster in k groups with k-1 vectors, and as you can see that is not possible.
kmeans in the community detection algorithm is used to cluster into k groups n vectors with dimensionality k-1, why this should be a problem? kmeans allows for arbitrary dimensionality of the vectors, and k-1 is the right value if there are k clear cut communities, since this information is embedded in the in the eigenvectors from 2 to k (ordered by the magnitude of the eigenvalue)
Ok, we want to cluster in k groups with k-1 eigenvectors. Each of these eigenvectors have either n (number of vertices), 2n, or m (number of edges) entries. Since the kmeans function has the line that I pointed out, we can't cluster in k groups with k-1 eigenvectors. I haven't studied your script yet but I tested it with the Zachary's network and I don't quite get the real communities, with neither the kmeans or the R_kmeans.
http://nbviewer.jupyter.org/github/ollin18/ML-ITAM/blob/master/Communities_NBM.ipynb
I have witten a couple of scripts for this particular method, and other nonbacktracking ones like Newman's flux matrix and the reluctant operator, they are not as elegant as yours but they give good partitions. They are all here: https://github.com/ollin18/All-InNet if you want to check them out (They are poorly written and some parts in spanish).
I think we can build something good together.
Also, the norm of some complex eigenvalues are greater than some of the real ones that we can use. Only the real eigenvalues outside the bulk (Wigner's semicircle law) are useful to find communities.
Ok, we want to cluster in k groups with k-1 eigenvectors. Each of these eigenvectors have either n (number of vertices), 2n, or m (number of edges) entries. Since the kmeans function has the line that I pointed out, we can't cluster in k groups with k-1 eigenvectors.
Ah ok, maybe I see the point, there is some confusion. I use k-1 eigenvectors of dimension n, but when i give them to kmeans i transpose the matrix, so that I cluster n vectors of size k-1, so we should be good there.
I see from the notebook that the non-bactracking methods perfoms very badly on the zachary net indeed, maybe there is no outside the bulk eigenvalue (except for the first one which is correlated to the degrees). More probably though there is some bug in the function, I'll check that
I just pushed here another spectral method based in the bethe hessian, that has some advantages over the non-bactraking, one of which is that relevant eigenvalues are immediately recognized as the negative ones. Could you try that on zachary net?
It would be nice to port all your methods here, please open a pr if you wish and have the time
I've tested the Bethe method and it clusters fine the Zachary's, have some troubles with the dolphins' and seems about right with the football one (gives 10 partitions, it's commonly divided into 10-12). Here's the nb:
http://nbviewer.jupyter.org/github/ollin18/ML-ITAM/blob/master/Bethe%20test.ipynb
Also, I found my first sketches of the NBM community detection algorithms. They're not well written but as they are all together I wanted to show them to you.
http://nbviewer.jupyter.org/github/ollin18/ML-ITAM/blob/master/Con%20deltas.ipynb
I'll put some work on the algorithms and open a pr anytime soon.
@ollin18 were you planning on opening a PR for this work? Should I close this issue?
Yes, I'm planning on opening a PR just need a couple of day to pulish the code
Closing this out since code is now in CommunityDetection.jl.
There is a problem using the Clustering.jl's kmeans algorithm for the NBM community detection because we have to cluster in k groups with k-1 vectors, and as you can see that is not possible.
I've been using the R kmeans functions with the RCall package, and works perfectly but adds a dependency.