Open Weijun-H opened 2 weeks ago
I wonder if we should consider where to draw the line on what aggregate functions to include in the core (i.e. should we include all these new functions?)
Now that all aggregate functions use the same API, we could potentially keep more specialized functions such as listed here outside the ore -- either in its own crate or even own repo -- and then have other code integrate it in -- e.g. https://github.com/apache/datafusion/issues/11979
I started a discussion about if we should be adding all these functions directly in the core here: https://github.com/apache/datafusion/issues/12357
I wonder if we should consider where to draw the line on what aggregate functions to include in the core (i.e. should we include all these new functions?)
Now that all aggregate functions use the same API, we could potentially keep more specialized functions such as listed here outside the ore -- either in its own crate or even own repo -- and then have other code integrate it in -- e.g. #11979
I like this idea! 🚀
@Weijun-H and @dmitrybugakov and @dharanad -- what do you think about creating a datafusion-functions-duckdb
repo in datafusion-contrib similar to https://github.com/datafusion-contrib/datafusion-functions-json for JSON from @samuelcolvin and co.
It would be a pretty neat way to help build out the function library in DataFUsion and would show off its extensibility
I could then try an integrate it into dft
that @matthewmturner and I have been working on: https://github.com/datafusion-contrib/datafusion-dft which would make it easer to use
Originally from: https://github.com/apache/datafusion/pull/12476#issuecomment-2353611810
Thank you @alamb for proposing this initiative. I like this idea. What about others' thought?
It clearly draws a line between the core
and the extensions
. And we can still leverage those functions as extension in dft
.
Is your feature request related to a problem or challenge?
DataFusion now supports several aggregation functions, but it still lacks some common ones that are essential for a broader range of data processing tasks. To make DataFusion more versatile and capable of handling diverse workloads, it should include additional aggregation functions commonly used in data analysis, such as mode and max_by.
Describe the solution you'd like
Describe alternatives you've considered
No response
Additional context
No response