apache / datafusion-comet

Apache DataFusion Comet Spark Accelerator
https://datafusion.apache.org/comet
Apache License 2.0
823 stars 163 forks source link

Native PhysicalPlanner extension points #1077

Open mwiewior opened 1 week ago

mwiewior commented 1 week ago

What is the problem the feature request solves?

Extending Comet with custom operators requires three-step procedure:

  1. Creating a custom SparkSessionExtensions class responsible for modification of SparkPlan object. This is not trivial especially if you would like to apply custom rules on top plan that has been already processed by the rules from CometSparkSessionExtensions it requires rerun of native block merging, re-serialization of native operators for a given block, etc - not easy but doable.
  2. Extending Prototbuf messages - currently making a copy of Comet proto files and adding a new operators - again not ideal but easy to do.
  3. The problem is with extending PhysicalPlanner and create_plan function that does the mapping of Opstruct to native implementation of operators.

Describe the potential solution

There is a long discussion (in many threads in the Internet) about the way plugin mechanism should be implemented in Rust applications given the problem of stability of Application Binary Interface (ABI), memory safety aspects, etc. See this, that and that . Seems like all approaches have their pros and cons and I'm too inexperienced in Rust that I could judge which is the best in Comet case. I'm happy to help to implement it once we agree upon the strategy.

Additional context

andygrove commented 1 week ago

Thanks for filing this @mwiewior. Having a plugin mechanism would allow us to support Rust UDFs as well, which would be a compelling feature for Comet.