apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
6.31k stars 1.19k forks source link

Convert `BuiltInWindowFunction::CumeDist` to a user defined window function #12695

Closed jcsherin closed 3 weeks ago

jcsherin commented 1 month ago

Is your feature request related to a problem or challenge?

Part of https://github.com/apache/datafusion/issues/8709

There is now no difference between "built in" / "prepackaged" scalar and aggregate functions in DataFusion, however there are still some "built in" window functions -- see the current source for BuiltInWindowFunction for the up to date list of what remains

The problem with having two different kinds of window functions is

  1. There are some features that may not be available to User Defined Window Functions that rely on built in
  2. Users can not easily choose which window functions to include or override the behavior if they need something different

Describe the solution you'd like

I would like to remove the "built in" version of this function and convert it to a user defined function

Describe alternatives you've considered

At a high level the process is:

  1. Add a new WindowUDFImpl in the functions-window crate
  2. Port the code from the relevant existing implementation of the the built in functions in datafusion/physical-expr/src/window
  3. Remove the BuiltInWindowFunction variant and then get everything to compile (the compiler will show you where the existing implementations are)

Additional context

Here are some good examples:

jcsherin commented 1 month ago

This is a good first issue.

SteNicholas commented 1 month ago

take

jonathanc-n commented 1 month ago

@SteNicholas Just checking, do you have time to do this? If not, I can try to work on it or work on it with you in parallel.