Open alamb opened 6 months ago
Here is a design document @mustafasrepo and I are working on: https://docs.google.com/document/d/1cIIJL6RKXKge-t8z4Rs_F-XH82giWQfmciwj-h3tho4/edit
@mustafasrepo and @ozankabak -- I went over and added a third design option (to move the order awareness into the aggregators), which I would like to consider as well. I think it is likely to perform signfiicantly faster for many queries as well as keep the HashAggregateExec simpler (though it makes the aggregators themselves potentially more complicated)
Can you review https://docs.google.com/document/d/1cIIJL6RKXKge-t8z4Rs_F-XH82giWQfmciwj-h3tho4 and let me know what you think?
FYI @mustafasrepo and @ozankabak -- I filed https://github.com/apache/arrow-datafusion/issues/8777, about reusing CTEs, as it came up in other context, and also could potentially be used for solution 2 in the the design document https://docs.google.com/document/d/1cIIJL6RKXKge-t8z4Rs_F-XH82giWQfmciwj-h3tho4/edit
Is your feature request related to a problem or challenge?
Today DataFusion supports three aggregate functions that can be "order aware":
ARRAY_AGG
,FIRST_VALUE
andLAST_VALUE
. This means that you can supply aORDER BY
clause to their argument, for exampleFIRST_VALUE(x ORDER BY time)
.Today, there be only one single order specified across ALL order aware aggregate functions
For example
Describe the solution you'd like
There are a few designs proposed here: https://github.com/apache/arrow-datafusion/pull/8558#issuecomment-1862649886
We are working on a more detailed proposal
Describe alternatives you've considered
No response
Additional context
No response