Closed vtraag closed 1 year ago
I went through all new classes to conform to the code style used throughout this package.
I rebased the PR against the master
branch so that there are no merge conflicts and force-pushed to this PR. Apparently the GitHub actions do not run in the case of merge conflicts.
See 15cf64289afd05ecc2da744cefcdf4bbb63f48f5, 94c72d274685c0d2a83446ff8b9bc52885fddaab, 472d10b274fc9124e812d1773bdbf945fc7588c5, and f83fe56df4359e16e9c26e5588b81ba389a4b513 for some bug fixes and enhancements.
This PR adds 64-bits capabilities to the
networkanalysis
package because natives arrays in Java are unfortunately limited to 32-bits.There are four new classes added to the
nl.cwts.util
package:LargeBooleanArray
LargeDoubleArray
LargeLongArray
LargeIntArray
These classes enable 64-bits indices (i.e.
long
s) which are translated to twoint
s (using bitwise operators) that index an internal array of arrays. The first array index is called thesegment
and the second array index is called theoffset
. The value for a providedlong index
is then located invalues[getSegment(index)][getOffset(index)]
.Additionally, these classes replace the old
DynamicIntArray
andDynamicDoubleArray
, since these also need 64-bits support. Hence, all the 64-bits arrays are now also dynamic and can be resized (seeensureCapacity
,shrink
andresize
) and values can be added/removed dynamically (seeappend
,push
andpop
).Additionally, these classes needed support for some other functionalities, which are also all implemented. There is one functionality (sorting) for which I thought it would be more straightforward to use an existing implementation (
it.unimi.dsi.fastutil.BigArrays
). Essentially this uses a similar setup in the background (array of arrays). The newly created classes are essentially similar to theBigArrays
implementation infastutil
but provide more convenience functions, so that it is easier to translate the rest of the package to be able to use these 64-bits capabilities. I added thefastutil
dependency to thebuild.gradle
file, and it is now automatically bundled in the resultingjar
file.All necessary modifications to all other classes are also made. This essentially entails changing
array[i]
notations intoarray.get(i)
orarray.set(i, ...)
notations.Sometimes there is a more convenient form of iteration possible that using a for-each idiom,
for(double x : array)
which can be substantially easier to understand. Additionally, it is faster than iterating overlong
indices that have to be separated into two separateint
s each time. This allows to change loops of the kindby the simpler and more elegant construct
without a performance hit, and even a performance increase.
Unfortunately, in many places these constructions are not immediately useful, because we need to iterate over both the
neighbors
and theedgeWeights
simultaneously, which is not possible in this idiom in Java (at least not in an elegant manner). Hence, except for one construct I now left all for loops as they were.In addition, I also added some similar convenience iterators in the
Network
class, allowing to iterate overneighbors
,edgeWeights
orincidentEdges
easily. There is the possibility to replace many loops of the kind ofby
which is substantially easier to understand. This will come at a slight performance hit probably, but I haven't tested this change explicitly. At any rate, I left these for loops as they were originally, and we might look into these possibilities at some later time.
Finally, I included unit tests for all these new classes, including tests for all methods. Additionally, I added a simply unit test for the
LeidenAlgorithm
and a check to see if the aggregation and quality calculations are consistent. These are automatically run when calling./gradlew build
. I have also verified the new classes with sizes of twice the maximum size of individual arrays (which is a total size of2 147 483 648
, exceeding the maximum length of a 32-bits arrays by a few elements). This revealed one remaining problem in the tests themselves (i.e. not in the code, but some test was incorrect), which I am re-testing now, but I do not expect any problems anymore.I tested this implementation against an example graph . The original implemetation took 40s, whereas the new implementation took 51s on my laptop. This performance decrease is of course a pity, but without investing substantially more time in rewriting aspects of the code, it is unlikely that we are able to improve the performance substantially. But suggestions are of course welcome.
Comments are welcome!