apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
6.26k stars 1.18k forks source link

Performance issue with `replace_params_with_values` #13307

Open askalt opened 2 hours ago

askalt commented 2 hours ago

Since release 40.0.0 there is a performance issue in function replace_params_with_values. There was a patch, that adds name preserving to the expression during placeholders substitution: https://github.com/apache/datafusion/commit/945902dd5d440bdc360cab60ef31cd0c3bceec41

For now replace_params_with_values works as follows:

It unconditionally copies the original name of the expression even it has not placeholders to substitute. It can affect performance and by my measures it works 2x slower on some plans, which contain few placeholders or does not have it at all.

I propose the following optimization: during first pass we can conclude if there are placeholders in the expression, and

askalt commented 2 hours ago

Please see my implementation in https://github.com/apache/datafusion/pull/13308.