BlazingDB / blazingsql

BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.
https://blazingsql.com
Apache License 2.0
1.92k stars 181 forks source link

[BUG] Joining on key columns pairs with different equality operators gives wrong output #1421

Closed rommelDB closed 3 years ago

rommelDB commented 3 years ago

Given the following query:

"""SELECT * FROM df df1 INNER JOIN df df2 ON df1.a = df2.b WHERE df1.b is not distinct from df2.b"""

that produces the logical plan below:

LogicalProject(a=[$0], b=[$1], a0=[$2], b0=[$3])
  LogicalJoin(condition=[AND(=($0, $3), IS NOT DISTINCT FROM($1, $3))], joinType=[inner])
    LogicalTableScan(table=[[main, df]])
    LogicalTableScan(table=[[main, df]])

gives a wrong output (depending on the data) because the join condition has both '=' and 'IS_NOT_DISTINCT_FROM', which is not supported in the underlying join function.