apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
5.6k stars 1.05k forks source link

[Epic] Unify `WindowFunction` Interface (remove built in list of `BuiltInWindowFunction` s) #8709

Open alamb opened 6 months ago

alamb commented 6 months ago

Is your feature request related to a problem or challenge?

For many of the same reasons as listed on https://github.com/apache/arrow-datafusion/issues/8045, having two types of aggregate functions ("built in" --BuiltInWindowFunction and WindowUDF is problematic for two reasons:

  1. There are some features that may not be available to User Defined Window Functions (such as reversing FIRST_VALUE and LAST_VALUE)
  2. Users can not easily choose which window functions to include (which will likely be especially problematic as we work to add more functions)

Describe the solution you'd like

I propose moving DataFusion to only use WindowURFs and remove BuiltInWindowFunction for the same reasons as https://github.com/apache/arrow-datafusion/issues/8045

We will keep the existing WindowUDF interface as much as possible, while also potentially providing an easier way to define them.

Describe alternatives you've considered

Additional context

Proposed implementation steps:

comphead commented 4 months ago

I'd like to try a small POC and migrate ROW_NUMBER to WindowUDF trait

alamb commented 4 months ago

I'd like to try a small POC and migrate ROW_NUMBER to WindowUDF trait

That would be awesome. Thank you

I recommend trying to put it in its own crate if possible (datfusion-window-functions perhaps?) but that doesn't have to be part of the POC