apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
5.21k stars 957 forks source link

Move `Covariance` (Sample) `covar` / `covar_samp` to be a User Defined Aggregate Function #10372

Closed jayzhan211 closed 1 week ago

jayzhan211 commented 2 weeks ago

Which issue does this PR close?

Part of #8708

Rationale for this change

We are moving aggregate functions out of the core to ensure the core APIs are sufficient to implement all aggregates and make datafusion more configurable

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

jayzhan211 commented 1 week ago

Thank you @jayzhan211

This looks great and I think it is a nice verification that we can extract aggregates from the code. I left some small suggestions but I don't think they are necessary -- I also updated the title and description of this PR to be more description

I believe this function (and related ones) are part of what @yyy1000 was working on when we started down the path to extract this type of thing from the core.

Thanks for your review. I plan to work on Sum or others that have additional features, but not too complex ones. covar_pop can be a good first issue.

alamb commented 1 week ago

Thanks again @jayzhan211