Closed KilianB closed 5 years ago
for smaller than 64 bit hashes we could also fall back to long variables and do primitive bit shifting.
jmh benchmarks. Big Integer is the current implementation. Performance decreases with increased hash length
Benchmark | Mode | Cnt | Score | Error | Units |
---|---|---|---|---|---|
kilianB.HashCreationBenchmark.benchmarkAverageHash32BigInteger | thrpt | 25 | 152626,545 | ± 2741,163 | ops/s |
kilianB.HashCreationBenchmark.benchmarkAverageHash32BigIntegerStringBuilder | thrpt | 25 | 158690,519 | ± 2111,147 | ops/s |
kilianB.HashCreationBenchmark.benchmarkAverageHash32MutableBigInteger | thrpt | 25 | 157546,217 | ± 2026,677 | ops/s |
kilianB.HashCreationBenchmark.benchmarkAverageHash128BigInteger | thrpt | 25 | 104932,391 | ± 1387,338 | ops/s |
kilianB.HashCreationBenchmark.benchmarkAverageHash128BigIntegerStringBuilder | thrpt | 25 | 117392,216 | ± 1035,051 | ops/s |
kilianB.HashCreationBenchmark.benchmarkAverageHash128MutableBigInteger | thrpt | 25 | 116409,836 | ± 2436,276 | ops/s |
kilianB.HashCreationBenchmark.benchmarkAverageHash5000BigInteger | thrpt | 25 | 1387,469 | ± 6,534 | ops/s |
kilianB.HashCreationBenchmark.benchmarkAverageHash5000BigIntegerStringBuilder | thrpt | 25 | 7690,280 | ± 96,870 | ops/s |
kilianB.HashCreationBenchmark.benchmarkAverageHash5000MutableBigInteger | thrpt | 25 | 7715,949 | ± 49,220 | ops/s |
Mutable big integer requires method handles and reflection access which is not warranted for the performance difference compare to the stringbuilder approach (as well as unit tests to guarantee that the implementation is working as expected).
A new StringBuilder is created for every hash creation. Caching a stringbuilder would result again in a performance gain at the cost of thread safety. In relation file IO is much more heavy and the treadoff isn't worth it.
Migration to either stringbuilder or mutable big int allows to cut the first pass needed to estimate the hash length on an algorithm basis
fixed with afc026dd79ae1e7ba721f566896a5a1f56385a39
Using a custom hash builder which manipulates a byte array directly we can gain even more performance d9ef1f86984d008b1f23fbef1efe03c9b8cae4df Now even the 5000 bit hash runs with 13.000 hashes created / second up from 1.3 k
fixed with v 3.0.0
During hash creation a bigInteger object will be created for each bit present. Currently no fancy operation take place during it's creation process that justify this overhead. Afterwards the xor operation on arbitrary hashes is required for the hamming distance calculation.
Benchmark if it's more performant to use a stringbuilder.
vs new
In cooperation with #18