Open js8544 opened 1 year ago
I don't know what a "Swiss table" is. Can you point to a general description?
More generally, I'm all for a faster associative table implementation.
Maybe we should first having an memory-usage / performance benchmark on this?
There's an Abseil's article and a CppCon talk that provide a good intro. The key idea is, when looking up a key via linear probing, it can check multiple slots at once.
Ah, so it's a hash table with SIMD-optimized lookup?
Maybe we should first having an memory-usage / performance benchmark on this?
Okay, I'll try to draft a proof of concept with the is_in
kernel.
If you're going to use SIMD lookups, then I would ask that it relies on xsimd instead of intrinsics.
Ah, so it's a hash table with SIMD-optimized lookup?
Well it's a speedup even without SIMD instructions. To put it in a oversimplified term each slot occupies 1 byte so with int64 we can already check 8 slots at the same time. But SIMD can definitely make it even faster.
If you're going to use SIMD lookups, then I would ask that it relies on xsimd instead of intrinsics.
Sure, if we decide to write our own version.
Well it's a speedup even without SIMD instructions. To put it in a oversimplified term each slot occupies 1 byte so with int64 we can already check 8 slots at the same time. But SIMD can definitely make it even faster.
I see, thanks for the explanation.
Sure, if we decide to write our own version.
Well, we don't want to depend on Abseil, so our own version it should probably be :-)
There's an Abseil's article and a CppCon talk that provide a good intro. The key idea is, when looking up a key via linear probing, it can check multiple slots at once.
Thank you for explaining. I was also curious about what Swiss Join is too. :)
I wrote a simple proof of concept for IndexIn
[1] with a slightly tweaked version of absl::flat_hash_map
. (Tweak is [2], a couple of operations can be skipped because our MemoTable
doesn't need to support deletion). And the existing IndexIn
benchmarks (results below) show about 20% ~ 60% improvement depending on the value_set
size, the larger the better.
The benchmarks are run on my Mac M1 so the results may vary on a x86 machine. But I think the diff is compelling enough for us to proceed with a Swiss table implementation.
[1] https://github.com/apache/arrow/compare/main...js8544:jinshang/swiss_proof_of_concept?diff=split#diff-d288c253c3a5959aa1b6840a649a8dafe42d14ac49d1837b19dd3a44c58f7fd9R93 [2] https://github.com/apache/arrow/commit/7d14d10d67826f4be7588b8b36d7aef7655dbcb3
Can you post benchmark numbers on string values? While hashing integers is nice, I'm not sure that's a dominant use case.
Sure. I chose Int32 because it was the only one tested with a large value set. I just wrote a similar set of benchmarks for strings and here's the result.
Ok. I agree we could replace MemoTable with something better. However, we don't want to have a dependency on Abseil, so it will have to be reimplemented.
Ok. I agree we could replace MemoTable with something better. However, we don't want to have a dependency on Abseil, so it will have to be reimplemented.
Yes. I'll implement one using xsimd
as you suggested. Also the swiss table may be worse than MemoTable
on super small tables, since it checks 8 slots at once and reserves at least 8 slots on construction, as shown in the first two string benchmarks. We'll need further benchmarking to decide which hashtable to use in each individual use cases, or maybe with a dynamic dispatching mechanism.
I guess this might benifit the parquet dictionary encoding, which is widely used in parquet. If nobody move forward, I'll have a try next month
Yeah I'm afraid I won't have time to do this recently. Please have a try if you wish. Thanks!
Will the swiss table currently being used in acero's hash join (aka. swiss join) help on this?
The swiss table for hash join is very specialized and kind of hard to generalize. It can't be directly used for memotable.
Could I try to solve this issue?
@SGZW Anyone can work on an issue without asking for permission. Feel free to experiment and submit a PR when ready.
Describe the enhancement requested
This is a follow up of https://github.com/apache/arrow/issues/36059#issuecomment-1592037527. There are many cases where we use a
MemoTable
, e.g. set lookup functions, vector hash functions, thecount_distinct
aggregate function, dictionary unification etc. Their performance can be boosted with a SwissTable.We can either:
absl::flat_hash_map
since they are the original author of swiss tables and we already hasabsl
in our 3rd party toolchain.acero
is too customized for the join node and it seems hard to extract a general hashtable from it. If I were to write one I would probably follow the structure ofabsl
's and replace things like memory management and bit tweaking with Arrow ones.What do you think? @pitrou @westonpace
Component(s)
C++