apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
14.3k stars 3.47k forks source link

[C++] Move murmur3 hash implementation to arrow/util #19635

Closed asfimport closed 5 years ago

asfimport commented 5 years ago

It would be good to consolidate hashing utility code in a central place (this is currently in src/parquet)

Reporter: Wes McKinney / @wesm

Related issues:

Note: This issue was originally created as ARROW-3298. Please see the migration documentation for further details.

asfimport commented 5 years ago

Wes McKinney / @wesm: We should take a look at this after ARROW-2653 goes in. Moving this to 0.13

asfimport commented 5 years ago

Antoine Pitrou / @pitrou: What complicates things a bit is that there are several different versions of Murmur (murmur2, murmur3, 32-bit-hash-producing, 64-bit-hash-producing) and also potentially several different implementations of each (with different performance characteristics).

So some review of current usage accross the codebase (Arrow, Plasma, Parquet, Gandiva) is needed. @fsaintjacques

asfimport commented 5 years ago

Antoine Pitrou / @pitrou: With the move to xxh3 Arrow is probably ditching Murmurhash for good. Perhaps we can simply close this issue.

asfimport commented 5 years ago

Wes McKinney / @wesm: Parquet has also dropped murmurhash from the Bloom filter implementation https://github.com/apache/parquet-format/commit/8f1783ec0b273e89c884b46c0f527d0a48321826#diff-d96aef0e8954afde569c8b40b8748081. So I'll close this one