apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
5.54k stars 1.03k forks source link

Unifying Projections with File Reader Execs #8075

Open berkaysynnada opened 8 months ago

berkaysynnada commented 8 months ago

Is your feature request related to a problem or challenge?

While implementing that projection pushdown rule, a case caught my attention. In some cases, projection and file readers (such as CsvExec) can be positioned sequentially, and the sole function of the Projection is to assign aliases. In these cases, couldn't this task be handled by the readers? They can perform projections internally, and if they could also define aliases, it would simplify such plans.

Describe the solution you'd like

While trying to eliminate projections after readers, if the projection makes aliasing, that new name can be used in the reader schema and other related outputs. The only exception is an evaluation in the projection, which cannot be operated during the read operation.

Describe alternatives you've considered

No response

Additional context

No response

alamb commented 8 months ago

I think the downside is that doing aliasing could complicated the code within all readers (not just the ones built into DataFusion).

What is the benefit of putting the aliasing within a scan?

berkaysynnada commented 8 months ago

I think the downside is that doing aliasing could complicated the code within all readers (not just the ones built into DataFusion).

What is the benefit of putting the aliasing within a scan?

Not a noticeable performance change in practice but plans with sequential projection - reader relation will be simplified. Perhaps some readers can perform aliasing with little effort.