fugue-project / fugue

A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.
https://fugue-tutorials.readthedocs.io/
Apache License 2.0
1.98k stars 94 forks source link

[FEATURE] Move Fugue SQL dependencies into extra `[sql]` and functions to become soft dependencies #482

Closed goodwanghan closed 9 months ago

goodwanghan commented 1 year ago

Fugue has become the distributed backends of a few high visibility open source projects. These projects only use the transform function, but not Fugue SQL. However we recently moved Fugue SQL dependencies to the core dependency to simplify the installation.

To be specific, these are the dependencies that are only useful when people use Fugue SQL: qpd, fugue-sql-antlr, sqlglot and jinja2

This has caused issues due to the breaking changes of these SQL only dependencies, as well as conflicting dependencies (e.g. antlr4-python3-runtime version conflicts).

In order to minimize the dependency impact to the core features of fugue. We may need to bring back the sql extra. Ideally, Fugue core dependency should exclude the SQL related dependencies. And it should not affect the use of transform. But when users need Fugue SQL, they should explicitly install fugue[sql]. But on the other hand, we still can import fsql directly from fugue instead of fugue_sql. That is why we call them soft dependencies.

In order to make this breaking change, we need to have two steps.

  1. In version <0.9, we still keep the Fugue SQL dependencies in the core dependency, but we make them soft dependencies inside the code. So if we manually remove those packages, the transform part should still work, we should make this change and add the correspondent tests to handle the breaking scenarios.
  2. In version >=0.9, we will move the Fugue SQL dependencies qpd, fugue-sql-antlr, sqlglot and jinja2 out from core dependencies, and into the extra [sql]