apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
5.21k stars 957 forks source link

Onyl recompute schema in `TypeCoercion` when necessary #10365

Open alamb opened 2 weeks ago

alamb commented 2 weeks ago

Is your feature request related to a problem or challenge?

Part of https://github.com/apache/datafusion/issues/10210 we are trying to make the optimizer faster by making the different optimizer passes faster.

https://github.com/apache/datafusion/pull/10356 avoids a bunch of copies in the TypeCoercion pass

@peter-toth pointed out https://github.com/apache/datafusion/pull/10356/files#r1588892502 that this pass still does more work than necessary as it still always recomputes the schema, even when it didn't make any changes

The root cause for this is that the expression rewrite that happens via TypeCoercionRewriter doesn't return Transformed and thus we must conservatively assume that the schema needs to be recomputed

Describe the solution you'd like

  1. Change TypeCoercionRewriter to return Transformed somehow
  2. Only call LogicalPlan::recompute_schema when the expression is actually transformed

Describe alternatives you've considered

No response

Additional context

No response