databricks / bundle-examples

Examples of Databricks Asset Bundles
Other
89 stars 30 forks source link

Add example to demonstrate use of Poetry #12

Closed pietern closed 10 months ago

atrbgithub commented 10 months ago

@pietern @andrewnester

Would it be possible to ensure that wheels built by poetry have the correct dependencies for a target databricks runtime environment? Do databricks provide or could you provide example pyproject.toml files which would ensure that wheels are pinning the same versions of packages which databricks installs for each of it's supported runtimes?

If for example a wheel is packaged using:

poetry build --format wheel

A databricks job cluster is created with the wheel as a library that is installed at cluster runtime, what happens if the wheel has dependencies which clash with the packages provided as part of the runtime?

An extreme example might be that a more recent version of pyspark has been specified which then gets installed by the wheel.

It would be nice if a package could be specified within your pyproject.toml file, something like:

[tool.poetry.dependencies]
python = "~3.8"
databricks-runtime = "9.1"

databricks-runtime would have as part of it's dependencies the locked versions of the packages the runtimes have installed on nodes when they are created. At least any packages which are publicly available. That would then give some confidence that the wheel you are building is going to be compatible with the target runtime.

pietern commented 10 months ago

@atrbgithub Interesting idea!

DBR comes with a large list of preinstalled packages, so I wouldn't want this pseudo dependency to take a dependency on all of them. But if you do specify any one of them, it could pin the version to the version included in the specified DBR version. Anecdotally, I was discussing this with @fjakobs yesterday, and we were debating what the best place to intercept these would be. As an alternative, we could cross-check a wheel's dependencies with the DBR version at deploy time, and then propose the list of pinned dependencies to be manually added to pyproject.toml.