Since col1 is the same for each projection, the sort can be pushed down below the Union and the new plan will be like this:
ProgressiveEval <-- new operator (available in InfluxDB IOx that output data in their order)
Union
Sort(col2, col3, ... coln) <-- sort is pushed down but only from col2 to coln
Projection (col1, col2, col3, ... coln)
Sort(col2, col3, ... coln)
Projection (col1, col2, col3, ... coln)
.....
.....
Sort(col2, col3, ... coln)
Projection (col1, col2, col3, ... coln)
There are now many sorts but each only work on a subset of data in parallel. Also, the ProgressiveEval ensure the sort streams is ordered by col1 which is very cheap.
Is your feature request related to a problem or challenge?
In InfluxDB IOx, we want to improve this query plan:
Since
col1
is the same for each projection, the sort can be pushed down below the Union and the new plan will be like this:There are now many sorts but each only work on a subset of data in parallel. Also, the
ProgressiveEval
ensure the sort streams is ordered bycol1
which is very cheap.The above plan would work for us, however, we hit an issue in DataFusion that the
Sorts
underUnion
are always removed from the plan at the theenforce_sorting
step https://github.com/apache/arrow-datafusion/blob/09f5a544d25f36ff1d65cc377123aee9b0e8f538/datafusion/core/src/physical_optimizer/enforce_sorting.rs#L361.After some investigation, we found the reason the sorts under union are always removed because the Union does not implement
required_input_ordering
. It uses the default https://github.com/apache/arrow-datafusion/blob/179179c0b719a7f9e33d138ab728fdc2b0e1e1d8/datafusion/physical-plan/src/lib.rs#L155 which is always an array of NoneDescribe the solution you'd like
Implement
required_input_ordering
in UnionExec to have it ask its inputs to keep their sort order if the Union has output_orderingDescribe alternatives you've considered
No response
Additional context
No response