Closed tarique-msci closed 1 year ago
hi @tarique-msci , this is expected behavior on interactive (all-purpose clusters). They're not recommended for job execution with wheels, as stated here - they don't support removing the wheel without restarting the cluster. If you want to make a development run on all-purpose cluster, use dbx execute. Here is a doc for reference - https://dbx.readthedocs.io/en/latest/guides/python/python_quickstart/#executing-code-on-databricks
I have a PySpark job that I am deploying as wheel package using dbx on Databricks. The deployment file looks something like below:
I have configured a job cluster for the actual job run but during development I use an all purpose cluster. To do the same I go to the Workflows UI on Databricks and swap out the cluster. For the first run this works fine. But if I modify the code, redeploy, swap the cluster and run it again, it installs the new wheel file on the cluster without removing the earlier one. And it run the older implementation instead of new one with changes. To run the new one I have to uninstall the wheel file from the cluster manually, restart the cluster and then only it works.
Example screenshot of libraries installed in cluster.
Expected Behavior
When a new version of the job in deployed and launched on a cluster the older wheel should be uninstalled.
Current Behavior
The new version of the wheel package is installed on the cluster without removing the older version and the run refers to the old wheel package.
Steps to Reproduce (for bugs)
After this you will have multiple version of the same wheel package installed on the cluster.
Context
Your Environment