Closed fbraza closed 2 years ago
Hey @fbraza . Quick question. How did you test that the .whl works on databricks? Did it not complain on the missing dependencies? eg. typer?
Hello @vikramaditya91
No it did not complain about that. the .whl
is just like a zip
file that contains compiled requirements as defined in your pyproject.toml
file. Everything should be present in the .whl
. But I did not use your package as if I was running it from a terminal with Typer but rather I imported the three ETL functions template that you defined. And they got imported without any issue.
I don't think it is relevant to use the CLI command in the databaricks environment. Moreover Spark session is already instantiated there. So that is why I just tested if it was possible to use the functions you defined there and it worked. ^_^
@fbraza It is not just for it to be used as a CLI. For example, if the pyproject.toml
says that it needs numpy
or boto3
, and I am using numpy
/boto3
for some reason in the ETL job, then it would complain that they are missing from the environment I suppose. To package the list of dependencies in the pyproject.toml, I had this make pack_req
which packages the dependencies in a package.zip file using the Docker image.
The question is for @sdebruyn I think.
The poetry build
commands creates a .whl
and .zip
file but they only contain the source code which is contained in here. It also contains a METADATA file which tells it about the dependencies. If Databricks/Synapse is smart enough to infer and install these dependencies based on METADATA, then great. If not, I believe, the dependencies which are packaged in the make pack_req
should be sent to the Databricks/Synapse
Thank you for your feedback @vikramaditya91. Concerning your point, if you install the package from the wheel it does also install the dependencies. You can make the test locally. Add boto3 and numpy, build the .whl
and then pip install your_whell.whl
and you can see that it installs all dependencies. So this should not be a problem once you are in Databricks.
On databricks you can use dbutils.library.install("dbfs:/path/to/your/library.whl")
or and only for .whl
use the pip
command %pip install /dbfs/path/to/your/library.whl
.
@fbraza Indeed, I just confirmed this with a Databricks notebook. When a .whl
package is uploaded, it installs all the dependencies based on the METADATA file. (pip install
uses the METADATA file too). So this PR is good for me.
Confirming it locally would not have been sufficient, because you want to emulate how Databricks would install the package. If it simply did a unzip
on .whl
file, it would have been insufficient.
Good ! Yes I was trying on databricks community but could not have a cluster spinning up ^_^.
Thx for having made the test yourself !
This a pull request to answer @sdebruyn raised issue.
I quickly test the
.whl
on databricks and from there you can use your skeleton functions.Cheers