Open jjeffcaii opened 1 year ago
The HASH-JOIN dataset API could be similar with below codes:
func HashJoin(left,right Dataset, joinColumns, ...other options) Dataset { // ... }
The HASH-JOIN should contain two phases:
hash(values of join_columns)
some docs:
A tiny example, we have two datasets, and we want to execute SQL like select foo.id,bar.id from foo join bar on foo.x = bar.y
select foo.id,bar.id from foo join bar on foo.x = bar.y
foo
bar
x -> x%2
{ 0: [b-6], 1: [a-5,c-7] }
j-5
k-8
a-5
pls assign to me
The HASH-JOIN dataset API could be similar with below codes:
The HASH-JOIN should contain two phases:
hash(values of join_columns)
, value=rowssome docs:
A tiny example, we have two datasets, and we want to execute SQL like
select foo.id,bar.id from foo join bar on foo.x = bar.y
foo
bar
foo
: a hash map by hash methodx -> x%2
, we got a map like{ 0: [b-6], 1: [a-5,c-7] }
bar
:j-5
will check the chunk[key=1] and thek-8
will check the chunk[key=0]a-5
andj-5
, bingo!