src/libkeccak: Use XKCP's non-CPU-specific 64-bit optimized implementation

cryptocoinjs / keccak

Keccak sponge function family

MIT License

86 stars 24 forks source link

src/libkeccak: Use XKCP's non-CPU-specific 64-bit optimized implementation #8

Closed cakoose closed 5 years ago

cakoose commented 5 years ago

It looks like we were using the reference C implementation, which is not designed for good performance.

Switching to the non-CPU-specific 64-bit optimized implementation makes us ~2x faster on small inputs and ~6x faster on large inputs.

cakoose commented 5 years ago

Running the benchmark on macOS 10.14.2, Node 10.14.2. (Not a typo -- exact same version number!)

Old code:

Bindings (current) x 166,786 ops/sec ±0.87% (85 runs sampled)
Pure JS (current) x 200,693 ops/sec ±0.96% (94 runs sampled)
Pure JS (sha3) x 11,886 ops/sec ±1.77% (94 runs sampled)
Pure JS (js-sha3) x 268,402 ops/sec ±0.97% (92 runs sampled)
Buffer 0bytes: fastest is Pure JS (js-sha3)
Bindings (current) x 4.05 ops/sec ±0.45% (15 runs sampled)
Pure JS (current) x 4.58 ops/sec ±1.40% (16 runs sampled)
Pure JS (sha3) x 0.16 ops/sec ±3.66% (5 runs sampled)
Pure JS (js-sha3) x 4.75 ops/sec ±2.82% (16 runs sampled)
Buffer 10MiB: fastest is Pure JS (js-sha3)

New code

Bindings (current) x 324,340 ops/sec ±1.19% (87 runs sampled)
...
Bindings (current) x 27.87 ops/sec ±1.31% (50 runs sampled)
...

(This uses the updated benchmark code from #7.)

fanatid commented 5 years ago

@cakoose users with 32-bit CPU will receive segfault?

cakoose commented 5 years ago

I think it will run correctly, but it will not run as quickly as the 32-bit optimized code.

Unfortunately, I'm on macOS and can't test because I can't find a 32-bit build of Node.

If you're on Linux, I think you can install a 32-bit build of Node (link) and then do:

$ node-gyp clean configure build --verbose --arch=ia32  # force 32-bit build
...
$ file build/Release/keccak.node  # confirm that binary is 32-bit
...
$ npm test
...

Would also be interesting to run the benchmark on 32-bit node and see if the current reference code runs faster than the 64-bit optimized code.

Ideally we would provide both the 32-bit and 64-bit optimized code and have node-gyp select the code depending on the platform. It seems like that should be possible, but unfortunately I know very little about node-gyp.

fanatid commented 5 years ago

After https://github.com/cakoose/keccak/pull/1 we will need figure out how support 32-bit version and can release major version.

cakoose commented 5 years ago

I figured out how to configure "binding.gyp" to do what we want.

I imported both the 32-bit optimized C code and the 64-bit optimized C code from XKCP.
If the Node process.arch is "arm64", "ppc64", or "x64", we will build the 64-bit optimized implementation.
Otherwise, we will build the 32-bit optimized implementation.

NOTE: I tested the 32-bit optimized C code on my 64-bit machine and it still runs faster than the "reference" C code we currently use (1.7x faster on short inputs, 2.5x faster on long inputs).

fanatid commented 5 years ago

Awesome! Thank you @cakoose Please check https://github.com/cakoose/keccak/pull/1

cakoose commented 5 years ago

I merged your commits into my branch and the tests passed. (I also incorporated your "src/README.md" fix into my own commit.)

Is there anything else you would like me to do?

fanatid commented 5 years ago

Thank you for big update and performance improvement! Published as 2.0.0