CWTSLeiden / networkanalysis

Java package that provides data structures and algorithms for network analysis.
MIT License
145 stars 33 forks source link

Warning: NetworkClustering fills the gap nodes from your input network as isolated clusters #6

Closed biowilliam closed 4 years ago

biowilliam commented 5 years ago

Input edgelists ; output cluster information for each nodes.

I have isolates in the network and so the edgelists will not contain any information about them. However the RunNetworkClustering.jar would assign cluster numbers sequentially for all nodes including those that do not exist in the input edgelists. That is, NetworkClustering will fill the gap nodes as isolated clusters from your input network. I have attached my edgelist input (node ranges from 0 to 76 without 36) and clusters output here (contain cluster for node 36).

Networkcluster warning example-edgelist.txt clusters.txt

vtraag commented 5 years ago

Just to be sure that I understand what the problem is. If you have an edgelist such as

0 1
3 4

node 2 is an isolate. The algorithm clusters nodes 0-4, and outputs a cluster for each node 0-4, and you end up with something like

0 1
1 1
2 3
3 2
4 2

You would have expected it to not output a cluster for node 2 in this case?

I think this is more easily solved as either a pre-processing step (make sure there are no isolates in your networks) or a post-processing step (ignore the isolates). Other people may actually expect the output to include a cluster for node 2 in such cases (this would also be my expectation). Not outputting the isolates would only complicate the matter I think.

Or would you simply prefer to get a warning that there are isolates or something?

biowilliam commented 5 years ago

You got the issue well. Yes, I agree that it is better to include the isolates in the final output as it is now, even better with a warning to highlight this isolated node issue especially for those who are new to network analysis. I spend quite some time to figure out the mismatch and just would like to share with the community for what I found. Thanks for your wonderful continuous improvement from SLM to Leiden.