CWTSLeiden / networkanalysis

Java package that provides data structures and algorithms for network analysis.
MIT License
145 stars 33 forks source link

java.lang.ArrayIndexOutOfBoundsException when using -q Modularity #19

Open ghost opened 3 years ago

ghost commented 3 years ago

The following command java -cp ~/Bioinformatics/networkanalysis/build/libs/networkanalysis-1.1.0-5-ga3f342d.jar nl.cwts.networkanalysis.run.RunNetworkClustering -q CPM -m 1 -w -o /tmp/edges_clustering.txt edges.txt runs just fine. However, if the -q option is set to "Modularity", I get the following crash:

RunNetworkClustering version 1.1.0 By Vincent Traag, Ludo Waltman, and Nees Jan van Eck Centre for Science and Technology Studies (CWTS), Leiden University Reading edge list from 'edges.txt'. Reading edge list took 0s. Network consists of 18386 nodes and 423926 edges with a total edge weight of 24546.50103795767. Using singleton initial clustering. Running Leiden algorithm. Quality function: Modularity Resolution parameter: 1.0 Minimum cluster size: 1 Number of random starts: 1 Number of iterations: 10 Randomness parameter: 0.01 Random number generator seed: random Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Index 5 out of bounds for length 5 at nl.cwts.networkanalysis.LocalMergingAlgorithm.findClustering(LocalMergingAlgorithm.java:190) at nl.cwts.networkanalysis.LeidenAlgorithm.improveClusteringOneIteration(LeidenAlgorithm.java:228) at nl.cwts.networkanalysis.LeidenAlgorithm.improveClusteringOneIteration(LeidenAlgorithm.java:276) at nl.cwts.networkanalysis.IterativeCPMClusteringAlgorithm.improveClustering(IterativeCPMClusteringAlgorithm.java:91) at nl.cwts.networkanalysis.run.RunNetworkClustering.main(RunNetworkClustering.java:413)

I've attached the input file here edges.txt I'm not sure what property of this input file would violate the preconditions of the software. The nodes are 0-indexed numbers, there are no duplicate edges, the lefthand node ID is always less than the righthand node ID, etc.

vtraag commented 3 years ago

The problem seems to be caused by the edges with zero weights. For the time being you can circumvent the problem by simply removing those edges. This can be done without any problem since edges with zero weights have no effect anyway.

Nonetheless, this is a bug that should be corrected. We will provide a fix at some later time.

vtraag commented 3 years ago

In particular, the root cause of the problem is that we check whether a neighboring cluster is already added in these conditionals:

https://github.com/CWTSLeiden/networkanalysis/blob/a3f342d2c9db52d35c2af2287ff2433e5cb7b093/src/main/java/nl/cwts/networkanalysis/LocalMergingAlgorithm.java#L188

https://github.com/CWTSLeiden/networkanalysis/blob/a3f342d2c9db52d35c2af2287ff2433e5cb7b093/src/main/java/nl/cwts/networkanalysis/FastLocalMovingAlgorithm.java#L162

https://github.com/CWTSLeiden/networkanalysis/blob/a3f342d2c9db52d35c2af2287ff2433e5cb7b093/src/main/java/nl/cwts/networkanalysis/StandardLocalMovingAlgorithm.java#L153

We should probably use a boolean array isClusterAdded instead of relying on the edge weight. This is the most robust way to address this issue I think. @neesjanvaneck, what do you think?

ghost commented 3 years ago

Ah, I see. Makes perfect sense. I didn't intend to have any edges with zero weight. I'm happy to check for that condition and remove those edges, which are meaningless. Would you like me to close this issue (as far as I'm concerned, it's already resolved)?

vtraag commented 3 years ago

No problem, I imagine that you did not check this condition prior to running the algorithm. No, let's leave the issue open, as the program shouldn't crash on this input.

ghost commented 3 years ago

I suppose a smaller change would simply be to filter out edges with zero weight in the code that reads the edges from the file. This way, you can enforce your own precondition, and leave the downstream code unchanged. The advantage is that it would avoid adding another variable for book-keeping.

vtraag commented 3 years ago

Yes, we would probably do that as well. However, the program still should not crash, even if an edge of zero weight would somehow be included.