Closed alamb closed 2 weeks ago
take
Thanks @Rachelint -- I also have some time later this week to help with this issue (if it might help to have one or two examples as we work through https://github.com/apache/datafusion/issues/11151)
Hi @alamb , I have some questions about the alternatives mentioned in the issue. It seems the interface should be like:
trait AggregateExpr {
fn output_from_stats(&self, stats: &Statistics) -> Option<ScalarValue> { None }
...
}
As I understand, it is a optimization takes effect when specific aggregate functions(such as max, min, count...) can get the results directly from stats?
As I understand, it is a optimization takes effect when specific aggregate functions(such as max, min, count...) can get the results directly from stats?
Yes that is the case
I think output_from_stats
would also be reasonable and potentially more general. One challenge is that the AggregateExpr
might not have access to its argument exprs (and thus it might not know what the argument is) 🤔
As I understand, it is a optimization takes effect when specific aggregate functions(such as max, min, count...) can get the results directly from stats?
Yes that is the case
I think
output_from_stats
would also be reasonable and potentially more general. One challenge is that theAggregateExpr
might not have access to its argument exprs (and thus it might not know what the argument is) 🤔
It seems a function expressions
exists, and can help?
https://github.com/apache/datafusion/blob/569be9eb1ba60d3e842f22de5080c732ac2206f4/datafusion/physical-expr-common/src/aggregate/mod.rs#L121
and it is used to get min/max from stats now? https://github.com/apache/datafusion/blob/569be9eb1ba60d3e842f22de5080c732ac2206f4/datafusion/core/src/physical_optimizer/aggregate_statistics.rs#L250-L257
Sounds like it might work -- I think the only way to really know for sure would be to try it out
Sounds like it might work -- I think the only way to really know for sure would be to try it out
Thanks, I try it now.
Sounds like it might work -- I think the only way to really know for sure would be to try it out
Thanks, I try it now.
@Rachelint if you stop working on this could you please let me know?
Sounds like it might work -- I think the only way to really know for sure would be to try it out
Thanks, I try it now.
@Rachelint if you stop working on this could you please let me know?
Sorry for delay, still working on now, will try to push codes today...
@alamb we can close this too as fixed via #12296 right?
Thanks @edmondop
Is your feature request related to a problem or challenge?
Part of https://github.com/apache/datafusion/issues/11151 (where we are removing special case uses of Min/Max)
Describe the solution you'd like
Reemove these cases:
https://github.com/apache/datafusion/blob/569be9eb1ba60d3e842f22de5080c732ac2206f4/datafusion/core/src/physical_optimizer/aggregate_statistics.rs#L185-L186
https://github.com/apache/datafusion/blob/569be9eb1ba60d3e842f22de5080c732ac2206f4/datafusion/core/src/physical_optimizer/aggregate_statistics.rs#L197-L198
https://github.com/apache/datafusion/blob/569be9eb1ba60d3e842f22de5080c732ac2206f4/datafusion/core/src/physical_optimizer/aggregate_statistics.rs#L235-L236
https://github.com/apache/datafusion/blob/569be9eb1ba60d3e842f22de5080c732ac2206f4/datafusion/core/src/physical_optimizer/aggregate_statistics.rs#L247-L248
Specifically, the idea is to remove the pattern
and
Describe alternatives you've considered
I suggest adding a general purpose function to
AggregateExec
and then implementing it for Min and Max (so we can do the same for Min/Max UDAF)