Closed MrPowers closed 1 year ago
Not at all - this is extremely informative. I'm wondering if we should create an issue to add delta-lake to conda-forge then, eh?!
@dennyglee +1
https://github.com/jupyter/docker-stacks/issues/1746 We are waiting for a new release. One of the things that are holding me back from using delta is that it doesn't follow spark releases. CC mathbunnyru Bidek56
@dennyglee - Great question. Here's my understanding of the Python dependency management situation:
From what I've seen, for web projects, something like poetry is definitely the best. It has deterministic, reproducible environments.
For Python data projects, conda is the best. I can send you this yaml file, you can run
conda env create -f envs/pyspark-322-delta-121.yml
and you can create a virtual environment that's roughly equivalent to what I have. You'll definitely have PySpark 3.2.2 & Delta 1.2.1 in that virtual environment, but there is no guarantee that conda resolves the other dependencies that aren't pinned to a specific version the same on your end.Also note that conda and pip aren't mutually exclusive. You can pip install into a conda environment. That's what's being done in this environment file:
We're pip installing delta-spark via conda because delta-spark is only published to PyPi and isn't published to conda.
Sorry for the rambling response, haha. Feel free to ask more questions. I am still learning about this.