apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
6.2k stars 1.17k forks source link

Library Guide: Extending DataFusion's operators: custom LogicalPlan and `ExecutionPlans` #7308

Open alamb opened 1 year ago

alamb commented 1 year ago

Is your feature request related to a problem or challenge?

Part of https://github.com/apache/arrow-datafusion/issues/7014

If we want to have DataFusion used as the core of many new systems, we need it to be as easy as possible for someone to get their idea working on top of DataFusion.

Thanks to @tshauck we now have a basic Library Users Guide ❤️ and this ticket describes expanding it out

Describe the solution you'd like

Fill in the content of https://arrow.apache.org/datafusion/library-user-guide/extending-operators.html

We can draw inspiration from https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/tests/user_defined/user_defined_plan.rs

Example Outline

  1. Introduce an example plan node that can not be expressed with existing relational operators (maybe pivot rows to columns, like here)
  2. Show how to define the Logical extension user defined node
  3. SHow how to use an extension planner physical planner to plan such a node (example here)
  4. Show how to create a simplified execution plan / stream

The examples directory holds a bunch more of examples: https://github.com/apache/arrow-datafusion/tree/main/datafusion-examples

Describe alternatives you've considered

No response

Additional context

No response

brayanjuls commented 2 months ago

I was investigating about pivoting in the DataFrame API and found some of the links in this issues are broken, leaving the replacement here for someone trying to work on this in the future

  1. pivot rows to columns, link
  2. how to use extension physical planner, link
alamb commented 2 months ago

Thanks @brayanjuls