Open alexcrichton opened 3 months ago
One area that would be particularly interesting to have benchmarks for are programs that require good performance of overflowing/saturating/checked arithmetic which isn't related to 128-bit. This would help stress the need for either 128-bit operations or overflow-flag-returning-instructions.
A suggestion here is that -ftrapv
can inject checked arithmetic for C and UBSan might rely on this heavily. A naive benchmark didn't show much performance difference relative to native without this proposal, however.
Just found out about this proposal today, but I've got a strong benchmark candidate here if you're still looking—XXH3 is critically reliant on a wide u64 mul operation for low-input-sizes, and thus our existing manual WASM implementation demonstrates worse performance than the older, non-vectorized XXH64 algorithm.
Yeah, it's not just XXH3, but rustc-hash, foldhash, aHash, wyhash, rapidhash, MUM Hash, umash and many more that all rely on 128-bit widening multiplication, basically all the fastest non-failing hashing algorithms that don't use AES in the SMhasher benchmarks.
Thanks @marcusdarmstrong and @CryZe! It'll be a bit easier to test and confirm in a few months once rustc and LLVM both support wide-arithmetic, but i64.mul_wide_u
should be perfect for 128-bit-widening-multiplication. Historical benchmarks have all shown that the wasm instructions are suitable for matching native performance in these situations.
Original development of this proposal benchmarked the
blind-sig
benchmark in Sightglass as well as the fibonacci benchmarks from the Rust num-bigint repository.This issue is intended to serve as a location for others to drop interesting benchmark programs as well so they can be collected to help evaluate this proposal over time. If you've got a benchmark you'd like to see added it would ideally be in C or Rust at this time and is ideally a program that has a means of self-reporting its execution time. High-level ideas are ok to but would require some more work to create a reproducible benchmark.