Open sarahyurick opened 1 year ago
I will take a look. There is a rule FilterNullJoinKeys
which will filter out the null keys.
Looks like by default this optimization is disabled. Could you please help to confirm which version of DataFusion you run in the past ?
pub filter_null_join_keys: bool, default = false
Thanks @mingmwang ! Knowing which rule/variable to look at is helpful. I'm still not able to see the IS NOT NULL
filters though, even when I verbosely initialize with let config = OptimizerContext::new().filter_null_keys(true);
.
Is it possible that IS NOT NULL
isn't being added because it's a Inner Join: Filter: d_table.d_col = c_table.c_col
instead of a Inner Join: c_table.c_col= d_table.d_col
?
I will take a closer look.
Describe the bug
In the Dask-SQL project, we have relied on DataFusion to create
IS NOT NULL
filters at theTableScan
level whenever a column is involved in a join. However, it looks like recent changes may have removed this feature?To Reproduce
The query
has the
LogicalPlan
Expected behavior
It still works when we write the query with a
WHERE
clause.produces
Additional context
I'm not quite sure when this change was introduced and if so, why? Is this something that DataFusion would be willing to fix, or would it be preferred that Dask-SQL re-adds the optimizer rule on our side?
cc @ayushdg @jdye64