Closed VibhuJawa closed 4 years ago
I just tried the following and it worked for me:
import cudf
bc = BlazingContext()
bc.s3('bsql_data', bucket_name='blazingsql-colab', access_key_id='AKIAJGB3SR3IXU3TE5WA', secret_key='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')
bc.create_table('nation', 's3://bsql_data/tpch_sf1/nation/0_0_0.parquet')
result = bc.sql('select * from nation where nation.n_nationkey in ( select other.n_nationkey from nation as other where n_nationkey = 16)').get()
print(result.columns)
The output i got was
0 16 MOZAMBIQUE 0
n_comment
0 s. ironic, unusual asymptotes wake blithely r ```
Can you show me a complete example where this is not working how you would expect?
Sorry for being unclear ,
I would like left-join
to work natively i.e , i would like below sql query to work.
SELECT e.EmpName, e.DepID
FROM @employees AS e
LEFT SEMIJOIN (SELECT (int?) DepID AS DepID, DepName FROM @departments) AS d
ON e.DepID == d.DepID;
Question
Will there be performance implications for using SELECT * FROM A WHERE A.key IN (SELECT B.key FROM B) pattern
instead of left-semi join
or do you expect the performance to remain the same ?
this is the relational algebra it generates. I am not sure what the semi join would generate but I am guessing something very similar.
LogicalJoin(condition=[=($0, $4)], joinType=[inner])
LogicalTableScan(table=[[main, nation]])
LogicalAggregate(group=[{0}])
BindableTableScan(table=[[main, nation]], filters=[[=($0, 16)]], projects=[[0]], aliases=[[n_nationkey]])
In the future we will be able to optimize out LogicalAggregate(group=[{0}])
using the CBO when the column in question is unique but that is currently not supported.
this is the relational algebra it generates. I am not sure what the semi join would generate but I am guessing something very similar.
LogicalJoin(condition=[=($0, $4)], joinType=[inner]) LogicalTableScan(table=[[main, nation]]) LogicalAggregate(group=[{0}]) BindableTableScan(table=[[main, nation]], filters=[[=($0, 16)]], projects=[[0]], aliases=[[n_nationkey]])
Thanks for looking into this, will update on this issue if using this pattern is not as performent as we would like on my use-case .
I would like for
blazing-sql
to support semi join .Semijoins are a way to filter a rowset based on the inclusion of its rows in another rowset.
SQL equivalent:
There are two variants:
LEFT SEMIJOIN
andRIGHT SEMIJOIN
.Details about semi join from :
https://docs.microsoft.com/en-us/u-sql/statements-and-expressions/select/from/joins/semijoin