Closed andygrove closed 4 months ago
Are we expecting a performance improvement as a result? (I doubt stack allocations are a performance bottleneck)
Linking this to #517 too.
Are we expecting a performance improvement as a result? (I doubt stack allocations are a performance bottleneck)
+1. I think we may need to figure out the root cause first. It might that the implementation of XxHash64
introduces some performance bottleneck, rather the stack allocation. I can try to reproduce and check this issue this weekend if I have some spare time.
Thanks @advancedxy. Yes, it definitely needs more investigation.
Here are benchmark results comparing murmur3 to xxhash64. xxhash64 is 3x slower (not sure if that is the expectation) but what is more interesting is that there is a warning during warmup (Unable to complete 100 samples in 5.0s).
Running benches/hash.rs (target/release/deps/hash-3beecac83c235dce)
Gnuplot not found, using plotters backend
hash/murmur3/8192 time: [342.78 µs 345.32 µs 349.49 µs]
Found 7 outliers among 100 measurements (7.00%)
4 (4.00%) high mild
3 (3.00%) high severe
Benchmarking hash/xxhash64/8192: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.0s, enable flat sampling, or reduce sample count to 70.
hash/xxhash64/8192 time: [992.21 µs 994.27 µs 996.88 µs]
Found 9 outliers among 100 measurements (9.00%)
5 (5.00%) low mild
3 (3.00%) high mild
1 (1.00%) high severe
@advancedxy I am working on re-implementing this now in a simpler way as an experiement ... I will aim to have a PR up by end of day
xxhash64 is 3x slower (not sure if that is the expectation)
xxhash64 is by no means slower than murmur3. Here's a sample benchmark: https://lz4.github.io/lz4-java/1.3.0/xxhash-benchmark/
I am working on re-implementing this now in a simpler way as an experiement ... I will aim to have a PR up by end of day
Thanks for your effort.
Here are benchmark results comparing murmur3 to xxhash64. xxhash64 is 3x slower (not sure if that is the expectation)
I don't think this is expected as @parthchandra already pointed out xxhash64 should be much faster than murmur3. Even the TPC-H q8 is not related to XXHash64, we should still investigate it further and improve its performance. I can help with that, though it's not urgent then.
What is the problem the feature request solves?
Here is the current xxhash64 implementation:
As the TODO comment states, it may be better to implement without stack allocation.
Here is Spark's version for reference:
Describe the potential solution
No response
Additional context
No response