Open berkaysynnada opened 1 month ago
Just some thoughts: maybe it's better not to expose this flag, but set it while planning in case of queries like
with data as (
select 1 as f
union all
select 2 as f
union all
select null as f
)
select * from data t1
join data t2
on t1.f is not distinct from t2.f; -- it (probably) may also be cool to support <=> operator here
right now it's planned as cross (technically) join with filter, but in fact this query may be a hash join with null equals null. And specifying join conditions like this (via is not distinct
or spaceship operator) potentially more convenient to control join behaviour (one query statement instead of set datafusion.xxx = true
+ query statement)
UPD:
And in the end this field, probably, is not needed (in its current form at least) as it won't help to handle queries like
JOIN ... ON left.x = right.x AND left.y <=> right.y
That makes sense. If such usages exist, there's no need to expose a new flag. It would be great if we could handle these two versions and link them to our internal join flags.
Is your feature request related to a problem or challenge?
Currently the
null_equals_null
flag in DF joins is hard-coded to false across multiple places in the code. It is not configurable either from a query or session configuration. We need to make this flag configurable to allow more flexibility in how joins handle null values.Describe the solution you'd like
I am not sure which alternative is more useful and expected, and I couldn't find any examples. If we select it as a session config, users will need to change the setting for each preference change. On the other hand, if it is provided within the query, users must explicitly specify it for each non-default case.
Describe alternatives you've considered
Perhaps both solutions can coexist.
Additional context
No response