Open ankitsultana opened 1 year ago
There is also something called as Symmetric Hash Join.
The idea is to increase pipelining and be able to produce the output / matching result tuple sooner than later instead of only after finishing the build phase entirely
So, instead of first building the hash table on one side and then probing --
SHJ builds 2 hash tables - one for each side of the JOIN. As soon as an input tuple arrives for T1, it is inserted into the respective hash table for T1 and then immediately used to probe the hash table for T2. If a match is found, output tuple can potentially be produced.
So the operator essentially alternates between reading both sides and probing each other to produce matching tuples asap until one side is exhausted. IIRC, Postgres had plans of implementing SHJ but not sure if it is part of the codebase.
@ankitsultana - is your suggestion somewhat along the above or something different ?
At present we first the process the right sub-tree of a join to build a HashTable and only after that's done do we process the left sub-tree.
Ideally we should be able to process the two sub-trees concurrently (bushy parallelism).
cc: @walterddr