Open yjshen opened 2 years ago
@alamb could you please provide some context on why we are using Expr
instead of PhyscialExpr
for ParquetExec
in the first place?
@alamb could you please provide some context on why we are using Expr instead of PhyscialExpr for ParquetExec in the first place?
I don't know of any original rationale -- I think @yordan-pavlov 's initial implementation was in terms of Expr
I think it would be fine to use physical exprs instead (as we can always translate from Expr --> PhysicalExpr)
@yjshen when I first introduced parquet predicate push-down for Data Fusion in https://github.com/apache/arrow/pull/9064 I was focused on reusing existing code for evaluating dynamically generated expressions. I don't remember any particular reason for using logical instead of physical expressions. If you think using physical expressions (instead of logical) would make the code better, that's fine with me.
Thanks @yordan-pavlov for the context! I am asking here to make sure this proposal won't break any hypothesis I'm not aware of.
We should avoid leaking logical expression to physical operators.
https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/physical_optimizer/pruning.rs#L93-L101
https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/physical_plan/file_format/parquet.rs#L77-L86