Add a set number of communities to be set by the user

Limiting the maximum number of communities directly in algorithms such as Fast Greedy, Louvain, Girvan-Newman, and Walktrap can be challenging due to their inherent design and method of operation, which do not typically include a parameter to set or cap the number of communities. These algorithms are structured around optimizing certain properties (like modularity or betweenness centrality) and not on achieving a specific number of communities. Likewise, it is not a common practice to predefine a number of communities for detection.

This is not a common practice and we should refrain from finding workarounds to make it work. Alternatively, we could use clustering. I tried implementing both K-means Clustering and Spectral Clustering. I found neither to yield meaningful communities. Here are some more details about these:

K-means Clustering: this algorithm directly takes the number of clusters (k) as a parameter. This algorithm can be applied to network data by considering node features or embedding nodes in a metric space where Euclidean distances can be computed.
Spectral Clustering: this algorithm uses the eigenvalues and eigenvectors of the Laplacian matrix of the graph to reduce dimensions, followed by a clustering method like k-means on the reduced space. The number of clusters is a parameter provided by the user.
1. Steps in Spectral Clustering
  1. Constructing the Laplacian Matrix
  2. Eigenvalue Decomposition
  3. Using Eigenvectors for Clustering
  4. Applying k-means Clustering
2. It operates on the graph Laplacian and uses the eigenvectors of this matrix. The key idea is to project the graph into a lower-dimensional space using the eigenvectors corresponding to the smallest non-trivial eigenvalues and then perform clustering (typically k-means) in this new space. This method is sensitive to the global structure of the graph and can sometimes force a kind of balance among the sizes of the clusters depending on the spectrum of the Laplacian.
3. Spectral clustering can sometimes perform unusually on networks where there are many peripheral nodes or the network is not well connected. The eigenvectors used might correspond to dimensions that isolate these peripheral nodes as they might be far from the main cluster in the spectral space.

Through spectral clustering, the results would not mimic those of other algorithms at all. I got one huge community and then k-1 communities of roughly one node.

CONCLUSION: do not limit the number of communities. This was just a fun pastime :)

QuintineSol / DCiA

Add a set number of communities to be set by the user #83