apache / datafusion-comet

Apache DataFusion Comet Spark Accelerator
https://datafusion.apache.org/comet
Apache License 2.0
551 stars 105 forks source link

Will Comet support closed-source forks of Apache Spark (e.g. CSP versions)? #414

Open andygrove opened 3 weeks ago

andygrove commented 3 weeks ago

What is the problem the feature request solves?

We have our first PR up that works around an issue with Comet working with AWS Spark (https://github.com/apache/datafusion-comet/pull/412).

I think we need to carefully consider our stance on supporting closed-source forks of Spark from the cloud service providers.

Supporting closed-source Spark versions is challenging for many reasons:

If the community desires to maintain Comet versions that can work with CSP Spark versions, then I think we would need to find an approach that allows those contributors to extend the "core" Comet project and add CSP support without adding maintenance burden for the core project.

One idea, for example, would be to keep the core datafusion-comet project compatible with OSS Apache Spark, and then have specific downstream repositories such as datafusion-comet-aws that extend the project to support a specific CSP.

Describe the potential solution

No response

Additional context

No response

viirya commented 3 weeks ago

Thanks @andygrove for creating this.

I think we don't claim that Comet supports for closed source forks of Spark right now. It would be impossible to make such claims as we don't have such resources to make sure it happens.

For #412, I think although it is proposed to support AWS Spark, but the patch actually can be seen as a prevention to take unexpected constructors which have different parameters. I think it makes sense to do.

viirya commented 3 weeks ago

There is also a reported compatibility issue with Databricks Spark: #190

andygrove commented 3 weeks ago

I plan on creating a PR to update our documentation to make it clear that we only support Apache Spark and not other Spark implementations.

parthchandra commented 2 weeks ago

I agree that we cannot support (i.e. guarantee compatibility) with proprietary forks, but I guess it is OK to accept PRs like #412 since it doesn't break anything and can only increase adoption. If the volume of such PRs becomes too large we can consider a contrib directory or even another repo.