databendlabs / databend

𝗗𝗮𝘁𝗮, 𝗔𝗻𝗮𝗹𝘆𝘁𝗶𝗰𝘀 & 𝗔𝗜. Modern alternative to Snowflake. Cost-effective and simple for massive-scale analytics. https://databend.com
https://docs.databend.com
Other
7.88k stars 753 forks source link

Feature: Radix hash join #7315

Open leiysky opened 2 years ago

leiysky commented 2 years ago

Radix hash join is a join algorithm that partitions input data of both sides into small buckets that can fit CPU cache, to improve the performance of hash join.

In opposite to radix hash join, we are using a shared hash table fashion now, which cannot utilize with CPU cache as good as radix hash join but works well with our Processor execution engine.

It will be nice if we can provide an option to use radix hash join in a query.

xudong963 commented 2 years ago

LGTM, https://15721.courses.cs.cmu.edu/spring2018/papers/19-hashjoins/schuh-sigmod2016.pdf

sundy-li commented 2 years ago

We have a radix partitioned hash table which named as TwoLevelHashTable. https://duckdb.org/2022/03/07/aggregate-hashtable.html