"""SELECT * FROM df df1 INNER JOIN df df2 ON df1.a = df2.b WHERE df1.b is not distinct from df2.b"""
that produces the logical plan below:
LogicalProject(a=[$0], b=[$1], a0=[$2], b0=[$3])
LogicalJoin(condition=[AND(=($0, $3), IS NOT DISTINCT FROM($1, $3))], joinType=[inner])
LogicalTableScan(table=[[main, df]])
LogicalTableScan(table=[[main, df]])
gives a wrong output (depending on the data) because the join condition has both '=' and 'IS_NOT_DISTINCT_FROM', which is not supported in the underlying join function.
Given the following query:
that produces the logical plan below:
gives a wrong output (depending on the data) because the join condition has both '=' and 'IS_NOT_DISTINCT_FROM', which is not supported in the underlying join function.