browserify / sha.js

Streamable SHA hashes in pure javascript
Other
288 stars 60 forks source link

sha256/512: extract branches out #19

Closed dcousens closed 9 years ago

dcousens commented 9 years ago

Exact same as #18, but for SHA256.

this pull request

run (N), input-size (bytes), ops (bytes/ms), time (ms)
1, 1, 27.13, 0.036859565057132324
3, 2, 62.42, 0.03204101249599487
5, 3, 80.94, 0.037064492216456635
7, 4, 117.56, 0.03402517863218782
8, 5, 132.05, 0.03786444528587656
9, 6, 211.14, 0.02841716396703609
10, 7, 288.61, 0.02425418384671356
11, 9, 355.59, 0.025310048089091368
12, 11, 457.16, 0.02406159769008662
13, 14, 448.98, 0.031181789834736514
14, 17, 502.69, 0.033818058843422386
15, 21, 800.52, 0.026232948583420776
16, 25, 932.75, 0.02680246582685607
17, 31, 944.26, 0.03282994090610637
18, 38, 1182.56, 0.032133676092544985
19, 46, 1280.18, 0.035932446999640676
20, 56, 1330.56, 0.04208754208754209
21, 69, 1672.56, 0.041254125412541254
22, 84, 1783.32, 0.047103155911446065
23, 103, 2147.55, 0.047961630695443645
24, 126, 1784.16, 0.07062146892655367
25, 154, 2146.76, 0.07173601147776183
26, 188, 2340.6, 0.08032128514056225
27, 230, 2651.9, 0.08673026886383348
28, 282, 3099.18, 0.09099181073703366
29, 345, 3232.65, 0.10672358591248667
30, 422, 2667.04, 0.15822784810126583
31, 516, 2714.16, 0.19011406844106463
32, 631, 2959.39, 0.21321961620469082
33, 772, 2045.8, 0.37735849056603776
34, 944, 3247.36, 0.29069767441860467
35, 1155, 3465, 0.3333333333333333
36, 1413, 3292.29, 0.4291845493562232
37, 1728, 3404.16, 0.5076142131979695
38, 2113, 2937.07, 0.7194244604316546
39, 2585, 3929.2, 0.6578947368421053
40, 3162, 3731.16, 0.847457627118644
41, 3868, 2823.64, 1.36986301369863
42, 4732, 2555.28, 1.8518518518518519
43, 5788, 2005.7425742574258, 2.8857142857142857
44, 7079, 3854.90099009901, 1.8363636363636364
45, 8660, 3810.4, 2.272727272727273
46, 10593, 2936.6732673267325, 3.607142857142857
47, 12957, 3976.90099009901, 3.2580645161290325
48, 15849, 3884.5588235294117, 4.08
49, 19387, 3041.098039215686, 6.375
50, 23714, 3487.3529411764707, 6.8
51, 29007, 3159.1782178217823, 9.181818181818182
52, 35481, 3041.2285714285713, 11.666666666666666
53, 43401, 3867.4158415841584, 11.222222222222221
54, 53088, 3473.0467289719627, 15.285714285714286
55, 64938, 3952.7478260869566, 16.428571428571427
56, 79433, 3394.5726495726494, 23.4
57, 97163, 3810.3137254901962, 25.5
58, 118850, 3047.4358974358975, 39
59, 145378, 3792.4695652173914, 38.333333333333336
60, 177828, 3014.0338983050847, 59
61, 217520, 3625.3333333333335, 60
62, 266073, 3389.4649681528663, 78.5
63, 325462, 4093.8616352201257, 79.5
64, 398107, 2764.6319444444443, 144
65, 486968, 4092.168067226891, 119
66, 595662, 3588.3253012048194, 166
67, 728618, 3238.302222222222, 225
68, 891251, 3825.1115879828326, 233
69, 1090184, 3879.658362989324, 281
70, 1333521, 3968.8125, 336
71, 1631173, 3940.0314009661834, 414
72, 1995262, 3859.3075435203095, 517
73, 2440619, 3904.9904, 625
74, 2985383, 4061.7455782312927, 735
75, 3651741, 4026.175303197354, 907
76, 4466836, 3935.5383259911896, 1135
77, 5463865, 3440.721032745592, 1588
78, 6683439, 4028.5949367088606, 1659
79, 8175230, 3999.623287671233, 2044
80, 10000000, 4100.041000410004, 2439

2.3.4

run (N), input-size (bytes), ops (bytes/ms), time (ms)
1, 1, 10.26, 0.09746588693957114
3, 2, 56.28, 0.03553660270078181
5, 3, 100.14, 0.029958058717795086
7, 4, 93.6, 0.042735042735042736
8, 5, 120.9, 0.0413564929693962
9, 6, 140.1, 0.042826552462526764
10, 7, 240.03, 0.029163021289005542
11, 9, 321.75, 0.027972027972027972
12, 11, 337.59, 0.03258390355164549
13, 14, 417.34, 0.03354579000335458
14, 17, 342.89, 0.0495785820525533
15, 21, 542.01, 0.03874467260751647
16, 25, 848.75, 0.029455081001472753
17, 31, 819.95, 0.03780718336483932
18, 38, 1000.54, 0.0379794910748196
19, 46, 967.38, 0.0475511174512601
20, 56, 855.12, 0.06548788474132286
21, 69, 1122.63, 0.06146281499692686
22, 84, 1148.28, 0.07315288953913679
23, 103, 1426.55, 0.07220216606498195
24, 126, 1363.32, 0.09242144177449169
25, 154, 1587.74, 0.09699321047526673
26, 188, 1564.16, 0.1201923076923077
27, 230, 1874.5, 0.12269938650306748
28, 282, 2129.7714285714287, 0.13240857503152584
29, 345, 1693.95, 0.20366598778004075
30, 422, 2489.8, 0.1694915254237288
31, 516, 2610.96, 0.1976284584980237
32, 631, 2202.19, 0.28653295128939826
33, 772, 1945.44, 0.3968253968253968
34, 944, 2747.04, 0.3436426116838488
35, 1155, 2529.45, 0.45662100456621
36, 1413, 2176.02, 0.6493506493506493
37, 1728, 2471.04, 0.6993006993006993
38, 2113, 1753.79, 1.2048192771084338
39, 2585, 2869.35, 0.9009009009009009
40, 3162, 3067.14, 1.0309278350515463
41, 3868, 2221.227722772277, 1.7413793103448276
42, 4732, 2764.237623762376, 1.7118644067796611
43, 5788, 2373.08, 2.4390243902439024
44, 7079, 2760.81, 2.5641025641025643
45, 8660, 2944.4, 2.9411764705882355
46, 10593, 2596.323529411765, 4.08
47, 12957, 2461.83, 5.2631578947368425
48, 15849, 2796.8823529411766, 5.666666666666667
49, 19387, 2879.2574257425745, 6.733333333333333
50, 23714, 2258.4761904761904, 10.5
51, 29007, 2462.8584905660377, 11.777777777777779
52, 35481, 2434.970588235294, 14.571428571428571
53, 43401, 2949.5825242718447, 14.714285714285714
54, 53088, 2413.090909090909, 22
55, 64938, 2319.214285714286, 28
56, 79433, 2269.5142857142855, 35
57, 97163, 2724.196261682243, 35.666666666666664
58, 118850, 2785.546875, 42.666666666666664
59, 145378, 2363.869918699187, 61.5
60, 177828, 2654.1492537313434, 67
61, 217520, 2605.0299401197603, 83.5
62, 266073, 2558.394230769231, 104
63, 325462, 2781.7264957264956, 117
64, 398107, 2764.6319444444443, 144
65, 486968, 2081.059829059829, 234
66, 595662, 2046.9484536082475, 291
67, 728618, 2335.3141025641025, 312
68, 891251, 2629.0589970501474, 339
69, 1090184, 3105.937321937322, 351
70, 1333521, 2778.16875, 480
71, 1631173, 2643.7163695299837, 617
72, 1995262, 2895.880986937591, 689
73, 2440619, 3005.68842364532, 812
74, 2985383, 2392.133814102564, 1248
75, 3651741, 2961.6715328467153, 1233
76, 4466836, 2977.8906666666667, 1500
77, 5463865, 3078.2338028169015, 1775
78, 6683439, 2770.9116915422887, 2412
79, 8175230, 2847.5200278648554, 2871
80, 10000000, 2675.227394328518, 3738
dcousens commented 9 years ago

I'm seeing a ~40% increase in speed, but that seems nuts. Can anyone else confirm?

dcousens commented 9 years ago

I'm seeing a 25% increase in performance for the same optimization in SHA512 as well... which is great. Open to better ways to structure this.

dominictarr commented 9 years ago

I'm only seeing a much smaller improvement - more like 10%

you might find this interesting blakes2 has it's loops completely unrolled. I didn't implement it I just found it on github and published to npm.

It's the fastest js hash I have found!

dcousens commented 9 years ago

@dominictarr that is also very difficult to read. IMHO that has to count for something?

Interesting that we got different results, I'm on Arch, Node v0.10.35 also.

dcousens commented 9 years ago

I'm still consistently getting at least a 20% improvement for SHA256. 1 - (2380/2960) = ~20%

dominictarr commented 9 years ago

@dcousens I think the way to do it well would be some sort of macro that generates it. That would be much easier to audit than hand-written unrolled code.

dominictarr commented 9 years ago

merged into 2.3.5

dcousens commented 9 years ago

@dominictarr agreed. I'll compare what performance gains can be had from this, and whether or not the speed gain can be found in something less obtrusive.