Open Dandandan opened 3 months ago
I've seen discussions about predicate reordering in the calcite community before, and one of the big problems is that the engine doing reordering of predicates invalidates the user-designed order of predicates, if the user understands that our short circuit optimisation writes the sql as a better order, but the engine reordering invalidates his efforts.
I've seen discussions about predicate reordering in the calcite community before, and one of the big problems is that the engine doing reordering of predicates invalidates the user-designed order of predicates, if the user understands that our short circuit optimisation writes the sql as a better order, but the engine reordering invalidates his efforts.
Good call, if we do it, it needs to be configurable so users/engines can disable the optimization.
We could potentially do some simple heuristics that would catch the common case -- like "treat regexp as very slow and do them after other predicates"
Is your feature request related to a problem or challenge?
After https://github.com/apache/datafusion/pull/11247 is merged we can look at ordering the boolean expressions according to a measure of evaluation cost.
Describe the solution you'd like
We can reorder expressions:
E.g. a expression like the following:
URL LIKE '%google%' AND code = 404
.Likely would be better reordered to
code = 404 AND URL LIKE '%google%'
in order to benefit most from short circuiting ascode = 404
is less expensive. One could also combine it with the estimate ofselectivity
to further optimize the order (low selectivity, batches more likely to be all false, high selectivity, batches more likely to be all true)Describe alternatives you've considered
No response
Additional context
No response