Open clementjuventin opened 2 months ago
After further investigation, I found another way of ordering branches (second commit) and obtained what we were looking for (~15% gain on evmmax_mul<uint256, bn254>)!
evmmax_mul<uint256, bn254>_median -0.1529 -0.1529 27 23 27 23
evmmax_mul<uint256, secp256k1>_median +0.0059 +0.0058 28 28 28 28
I still don't get why this implementation is better, even assuming branch prediction makes the checks insignificant. I would also like to compare the assembly code in case there is important optimization under the hood but I never did so let's see if I manage to do it
This PR implements the improvement of the CIOS variant of the Montgomery algorithm showcased in the following document: EdMSM: Multi-Scalar-Multiplication for SNARKs and Faster Montgomery multiplication and proposed in https://github.com/ethereum/evmone/issues/869. The optimization occurs when
most_significant_p_word < (word_size / 2)
,p
is represented by the variablemod
in the current implementation.Here are the benchmarks performed after applying these changes starting from commit 01eca779c58c19c8659fd5a6bf0cad85613c629b.
Build:
cmake --build build --parallel
Benchmarks:
taskset -c 0 ./build/bin/evmone-bench-internal --benchmark_filter=evmmax* --benchmark_repetitions=10 --benchmark_format=json --benchmark_out=cios_classic.json
Comparison:
As you can see, the results do not indicate a performance improvement.