Open MrPowers opened 1 year ago
I am not sure what would be the correct way, but maybe we could apply something like:
[tool.poetry.extras]
delta2.2-spark3.3 = ["delta-spark^2.2.0", "pyspark^3.3.1"]
delta2.1-spark3.3 = ["delta-spark^2.1.1", "pyspark^3.3.1"]
and then the user would just need to run poetry update; poetry install -E delta2.1-spark3.3
to install desired dependencies.
One of the pitfalls would be making sure that python3.9 works for all dependencies combinations and to create tests for each combination to make sure it works with the application code? What do you think?
I haven't given this idea a try yet, but looking at this issue gave me the impression that this could work.
@joao-fm-santos - this blog post has more context on the issue from a usability perspective.
Mack will typically be included as a dependency in other files. I'm not sure how we setup a Python project to correctly install a specific Delta Lake version based on the PySpark version that the user specified....
@danielbeach - FYI, we're looking into this issue.
@alexott - feel free to provide suggestions.
@MrPowers thanks for the blog post, really helpfull! Unless I understood the problem incorrectly, I believe adding extras would be a good way to solve this issue, as it allows users to use common poetry syntax to install dependencies, allowing to choose what version they prefer.
For example, a user could:
pyproject.toml
file like so:[tool.poetry.dependencies]
mack= {version = "*", extras = ["delta2.2-spark3.3"]}
poetry add 'mack["delta2.2-spark3.3]'
For the pip
installation, I believe we would need to change setup.cfg
to include extras like so.
I have not tried this, but let me know if I am missing the point here!
@joao-fm-santos - yea, extras
could be the right way to solve this. I don't know.
We need a solution that will work in a variety of execution contexts:
pip install mack
, they should get the required dependencies installedpip install mack
on an existing PySpark cluster and get all the dependencies installed.One of my other projects uses a library called findspark. Is it possible we need a library like finddelta?
Users have to supply a correct combination of Spark & Delta Lake versions for their setup to work, see the compatibility matrix.
Mack depends on PySpark & Delta Lake. We want Mack to work with a variety of Spark & Delta Lake combinations.
Here's how the dependencies are currently specified in the
pyproject.toml
file:I'm not sure the best way to specify dependencies using Poetry to give our users the best Mack download experience. Thoughts?