datafusion-python users requested pyarrow predicate pushdown support for temporal types.
What changes are included in this PR?
IsNull bug
The conversion was incorrectly passing the column-expression as an argument to the pyarrow method is_null. This would silently fail and the predicate would be excluded from the plan.
The argument should be a scalar for nan_is_null. I do not currently have a way for users to pass that in, so please suggest how I might do so.
Temporal Scalars
Similar to #731, I used ScalarValue::to_pyarrow for the scalar conversion. pyarrow filters can now accept anything that already has an upstream conversion.
Are there any user-facing changes?
A bugfix and expanded functionality.
Additional Context
I tested the predicate pushdown in two separate ways.
1) Ensuring that explain plan contains the appropriate string.
2) Ensuring that a query on a partitioned dataset doesn't touch the file.
Both of these seem non-ideal. If you have a suggestion for more efficiently testing this, please share!
Which issue does this PR close?
Closes #703.
Rationale for this change
The conversion for
IsNull
had a bug.datafusion-python
users requestedpyarrow
predicate pushdown support for temporal types.What changes are included in this PR?
IsNull
bug The conversion was incorrectly passing thecolumn-expression
as an argument to thepyarrow
methodis_null
. This would silently fail and the predicate would be excluded from the plan.The argument should be a scalar for
nan_is_null
. I do not currently have a way for users to pass that in, so please suggest how I might do so.Temporal Scalars Similar to #731, I used
ScalarValue::to_pyarrow
for the scalar conversion.pyarrow
filters can now accept anything that already has an upstream conversion.Are there any user-facing changes?
A bugfix and expanded functionality.
Additional Context
I tested the predicate pushdown in two separate ways.
1) Ensuring that
explain
plan contains the appropriate string. 2) Ensuring that a query on a partitioned dataset doesn't touch the file.Both of these seem non-ideal. If you have a suggestion for more efficiently testing this, please share!