Nycto / Hasher

A small Scala library for easily generating hashes (md5, sha1, sha256, sha512, crc32, bcrypt, hmacs, pbkdf2)
MIT License
186 stars 40 forks source link

Improve Hash#hex performance. #15

Closed y2k2mt closed 7 years ago

y2k2mt commented 7 years ago

Hello.

Current implementation of Hash#hex is very slow. I improved that performance.

Here is a jmh benchmark results.

Before fix.

[info] Running org.openjdk.jmh.Main -i 3 -wi 3 -f1 -t1
[info] # JMH 1.10.3 (released 503 days ago, please consider updating!)
[info] # VM version: JDK 1.8.0_102, VM 25.102-b14
[info] # VM invoker: /Library/Java/JavaVirtualMachines/jdk1.8.0_102.jdk/Contents/Home/jre/bin/java
[info] # VM options: <none>
[info] # Warmup: 3 iterations, 1 s each
[info] # Measurement: 3 iterations, 1 s each
[info] # Timeout: 10 min per iteration
[info] # Threads: 1 thread, will synchronize iterations
[info] # Benchmark mode: Throughput, ops/time
[info] # Benchmark: test.roundeights.hasher.HexStringBenchmarkSpec.hex
[info] 
[info] # Run progress: 0.00% complete, ETA 00:00:06
[info] # Fork: 1 of 1
[info] # Warmup Iteration   1: 10.897 ops/ms
[info] # Warmup Iteration   2: 46.537 ops/ms
[info] # Warmup Iteration   3: 53.878 ops/ms
[info] Iteration   1: 53.108 ops/ms
[info] Iteration   2: 44.550 ops/ms
[info] Iteration   3: 55.173 ops/ms
[info] 
[info] 
[info] Result "hex":
[info]   50.944 ±(99.9%) 102.761 ops/ms [Average]
[info]   (min, avg, max) = (44.550, 50.944, 55.173), stdev = 5.633
[info]   CI (99.9%): [≈ 0, 153.704] (assumes normal distribution)
[info] 
[info] 
[info] # Run complete. Total time: 00:00:11
[info] 
[info] Benchmark                    Mode  Cnt   Score     Error   Units
[info] HexStringBenchmarkSpec.hex  thrpt    3  50.944 ± 102.761  ops/ms

After fix.

[info] Running org.openjdk.jmh.Main -i 3 -wi 3 -f1 -t1
[info] # JMH 1.10.3 (released 503 days ago, please consider updating!)
[info] # VM version: JDK 1.8.0_102, VM 25.102-b14
[info] # VM invoker: /Library/Java/JavaVirtualMachines/jdk1.8.0_102.jdk/Contents/Home/jre/bin/java
[info] # VM options: <none>
[info] # Warmup: 3 iterations, 1 s each
[info] # Measurement: 3 iterations, 1 s each
[info] # Timeout: 10 min per iteration
[info] # Threads: 1 thread, will synchronize iterations
[info] # Benchmark mode: Throughput, ops/time
[info] # Benchmark: test.roundeights.hasher.HexStringBenchmarkSpec.hex
[info]
[info] # Run progress: 0.00% complete, ETA 00:00:06
[info] # Fork: 1 of 1
[info] # Warmup Iteration   1: 1707.758 ops/ms
[info] # Warmup Iteration   2: 2222.389 ops/ms
[info] # Warmup Iteration   3: 2297.570 ops/ms
[info] Iteration   1: 2605.627 ops/ms
[info] Iteration   2: 2599.048 ops/ms
[info] Iteration   3: 2588.521 ops/ms
[info]
[info]
[info] Result "hex":
[info]   2597.732 ±(99.9%) 157.416 ops/ms [Average]
[info]   (min, avg, max) = (2588.521, 2597.732, 2605.627), stdev = 8.629
[info]   CI (99.9%): [2440.316, 2755.148] (assumes normal distribution)
[info]
[info]
[info] # Run complete. Total time: 00:00:11
[info]
[info] Benchmark                    Mode  Cnt     Score     Error   Units
[info] HexStringBenchmarkSpec.hex  thrpt    3  2597.732 ± 157.416  ops/ms
Nycto commented 7 years ago

Thank you for the contribution! I'll try to take a look at the code this weekend.

Before I do, though, one question: Was this a hotspot you found in your app, or were you benchmarking the lib in isolation when you found this?

y2k2mt commented 7 years ago

I found this issue when I was stress testing our web application that generates hash(output for hex) per request.

At that time I noticed Hash#hex method(uses String#format derived from Regex) consumes a lot of cpu time.

y2k2mt commented 7 years ago

Thank you for taking your time for the review.

I fixed my codes.

Have you tested to see if this function is faster? http://docs.oracle.com/javase/7/docs/api/javax/xml/bind/DatatypeConverter.html#printHexBinary(byte[])

Here is another benchmark result. My implementation is 1.5 times faster.

My implementation #2(review has been applied)

[info] Running org.openjdk.jmh.Main -i 3 -wi 3 -f1 -t1
[info] # JMH 1.10.3 (released 507 days ago, please consider updating!)
[info] # VM version: JDK 1.8.0_102, VM 25.102-b14
[info] # VM invoker: /Library/Java/JavaVirtualMachines/jdk1.8.0_102.jdk/Contents/Home/jre/bin/java
[info] # VM options: <none>
[info] # Warmup: 3 iterations, 1 s each
[info] # Measurement: 3 iterations, 1 s each
[info] # Timeout: 10 min per iteration
[info] # Threads: 1 thread, will synchronize iterations
[info] # Benchmark mode: Throughput, ops/time
[info] # Benchmark: test.roundeights.hasher.HexStringBenchmarkSpec.hex
[info] 
[info] # Run progress: 0.00% complete, ETA 00:00:06
[info] # Fork: 1 of 1
[info] # Warmup Iteration   1: 2303.870 ops/ms
[info] # Warmup Iteration   2: 3040.135 ops/ms
[info] # Warmup Iteration   3: 3240.919 ops/ms
[info] Iteration   1: 3234.058 ops/ms
[info] Iteration   2: 3195.944 ops/ms
[info] Iteration   3: 3246.272 ops/ms
[info] 
[info] 
[info] Result "hex":
[info]   3225.425 ±(99.9%) 478.924 ops/ms [Average]
[info]   (min, avg, max) = (3195.944, 3225.425, 3246.272), stdev = 26.251
[info]   CI (99.9%): [2746.500, 3704.349] (assumes normal distribution)
[info] 
[info] 
[info] # Run complete. Total time: 00:00:11
[info] 
[info] Benchmark                    Mode  Cnt     Score     Error   Units
[info] HexStringBenchmarkSpec.hex  thrpt    3  3225.425 ± 478.924  ops/ms

DatatypeConverter version. Implementation is below.

…
lazy val hex: String = javax.xml.bind.DatatypeConverter.printHexBinary(bytes).toLowerCase
…

and a benchmark result. (if toLowerCase is not required, benchmark indicates 5k - 5.5k ops/ms.)

[info] Running org.openjdk.jmh.Main -i 3 -wi 3 -f1 -t1
[info] # JMH 1.10.3 (released 507 days ago, please consider updating!)
[info] # VM version: JDK 1.8.0_102, VM 25.102-b14
[info] # VM invoker: /Library/Java/JavaVirtualMachines/jdk1.8.0_102.jdk/Contents/Home/jre/bin/java
[info] # VM options: <none>
[info] # Warmup: 3 iterations, 1 s each
[info] # Measurement: 3 iterations, 1 s each
[info] # Timeout: 10 min per iteration
[info] # Threads: 1 thread, will synchronize iterations
[info] # Benchmark mode: Throughput, ops/time
[info] # Benchmark: test.roundeights.hasher.HexStringBenchmarkSpec.hex
[info] 
[info] # Run progress: 0.00% complete, ETA 00:00:06
[info] # Fork: 1 of 1
[info] # Warmup Iteration   1: 1578.760 ops/ms
[info] # Warmup Iteration   2: 2036.710 ops/ms
[info] # Warmup Iteration   3: 2188.365 ops/ms
[info] Iteration   1: 2213.376 ops/ms
[info] Iteration   2: 2169.245 ops/ms
[info] Iteration   3: 2183.678 ops/ms
[info] 
[info] 
[info] Result "hex":
[info]   2188.766 ±(99.9%) 410.503 ops/ms [Average]
[info]   (min, avg, max) = (2169.245, 2188.766, 2213.376), stdev = 22.501
[info]   CI (99.9%): [1778.263, 2599.269] (assumes normal distribution)
[info] 
[info] 
[info] # Run complete. Total time: 00:00:11
[info] 
[info] Benchmark                    Mode  Cnt     Score     Error   Units
[info] HexStringBenchmarkSpec.hex  thrpt    3  2188.766 ± 410.503  ops/ms