Eventual-Inc / Daft

Distributed data engine for Python/SQL designed for the cloud, powered by Rust
https://getdaft.io
Apache License 2.0
2.34k stars 164 forks source link

cross joins #2994

Open universalmind303 opened 1 month ago

universalmind303 commented 1 month ago

Is your feature request related to a problem? Please describe. I want to perform cross joins using daft

Describe the solution you'd like df1.join(df2, how='cross')

Describe alternatives you've considered df1.join(df2, on=lit(1))

universalmind303 commented 1 month ago

additionally,

a cross join followed by a filter comparing columns between the two inputs should be optimized into a inner join

example:

df1.join(df2, how='cross').where(df1['text'] == df2['name'])

this can be optimized to df1.join(df2, left_on=col('text'), right_on=col('name') how='inner')

universalmind303 commented 4 weeks ago

For reference, Datafusion has an eliminate_cross_join rule that rewrites cross joins to inner joins where possible

https://github.com/GlareDB/arrow-datafusion/blob/20b298e9d82e483e28087e595c409a8cc04872f3/datafusion/optimizer/src/eliminate_cross_join.rs#L44

universalmind303 commented 4 weeks ago

created #3095 for the optimizer rule.