Closed cakoose closed 5 years ago
Running the benchmark on macOS 10.14.2, Node 10.14.2. (Not a typo -- exact same version number!)
Old code:
Bindings (current) x 166,786 ops/sec ±0.87% (85 runs sampled)
Pure JS (current) x 200,693 ops/sec ±0.96% (94 runs sampled)
Pure JS (sha3) x 11,886 ops/sec ±1.77% (94 runs sampled)
Pure JS (js-sha3) x 268,402 ops/sec ±0.97% (92 runs sampled)
Buffer 0bytes: fastest is Pure JS (js-sha3)
Bindings (current) x 4.05 ops/sec ±0.45% (15 runs sampled)
Pure JS (current) x 4.58 ops/sec ±1.40% (16 runs sampled)
Pure JS (sha3) x 0.16 ops/sec ±3.66% (5 runs sampled)
Pure JS (js-sha3) x 4.75 ops/sec ±2.82% (16 runs sampled)
Buffer 10MiB: fastest is Pure JS (js-sha3)
New code
Bindings (current) x 324,340 ops/sec ±1.19% (87 runs sampled)
...
Bindings (current) x 27.87 ops/sec ±1.31% (50 runs sampled)
...
(This uses the updated benchmark code from #7.)
@cakoose users with 32-bit CPU will receive segfault?
I think it will run correctly, but it will not run as quickly as the 32-bit optimized code.
Unfortunately, I'm on macOS and can't test because I can't find a 32-bit build of Node.
If you're on Linux, I think you can install a 32-bit build of Node (link) and then do:
$ node-gyp clean configure build --verbose --arch=ia32 # force 32-bit build
...
$ file build/Release/keccak.node # confirm that binary is 32-bit
...
$ npm test
...
Would also be interesting to run the benchmark on 32-bit node and see if the current reference code runs faster than the 64-bit optimized code.
Ideally we would provide both the 32-bit and 64-bit optimized code and have node-gyp select the code depending on the platform. It seems like that should be possible, but unfortunately I know very little about node-gyp.
After https://github.com/cakoose/keccak/pull/1 we will need figure out how support 32-bit version and can release major version.
I figured out how to configure "binding.gyp" to do what we want.
process.arch
is "arm64", "ppc64", or "x64", we will build the 64-bit optimized implementation.NOTE: I tested the 32-bit optimized C code on my 64-bit machine and it still runs faster than the "reference" C code we currently use (1.7x faster on short inputs, 2.5x faster on long inputs).
Awesome! Thank you @cakoose Please check https://github.com/cakoose/keccak/pull/1
I merged your commits into my branch and the tests passed. (I also incorporated your "src/README.md" fix into my own commit.)
Is there anything else you would like me to do?
Thank you for big update and performance improvement! Published as 2.0.0
It looks like we were using the reference C implementation, which is not designed for good performance.
Switching to the non-CPU-specific 64-bit optimized implementation makes us ~2x faster on small inputs and ~6x faster on large inputs.