apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
5.87k stars 1.11k forks source link

Implement array aggregate functions #7214

Open izveigor opened 1 year ago

izveigor commented 1 year ago

Is your feature request related to a problem or challenge?

Arrow DataFusion has a lot of aggregate functions for scalars and columns. We can compute an aggregate function with array by unnest funciton, but in my opinion it would be better to implement DuckDB methods to use different list_ aggregate functions.

List

The full list of array aggregate functions:

General:

Statistical:

Approximate:

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

DuckDB documentation: https://duckdb.org/docs/sql/functions/nested; Apache Arrow DataFusion aggregate functions: https://arrow.apache.org/datafusion/user-guide/sql/aggregate_functions.html

edmondop commented 10 months ago

This seems useful and something I can look into, can I pick it up @jayzhan211 ?

jayzhan211 commented 10 months ago

This seems useful and something I can look into, can I pick it up @jayzhan211 ?

We have starting from sum but meet quite many challenges here. You can look into this first https://github.com/apache/arrow-datafusion/pull/7242. I plan to merge https://github.com/apache/arrow-datafusion/issues/7960 first and continue on #7242. You can welcome to pick any you are interesting in.