Open rsayn opened 3 weeks ago
Hi @rsayn ! Thanks for reporting the issue. Just to confirm: when you run a workflow with this cluster, the library is not installed as well?
Hey @andrewnester! If I define jobs to run on this cluster I can include libraries from the job / task definition. However, my use case here is to boot an interactive small cluster for dev / debugging things via attached notebooks, and I'd like to avoid the overhead of manually installing the project wheel that I deploy through DABs.
My request comes from the fact that you can specify cluster-scoped libraries from the Databricks UI, the SDK or via a cluster policy, but not via DABs.
@rsayn thanks for clarifying, it makes sense. My expectation was that in the configuration like you have libraries will be installed when the cluster is started (when corresponding job is started). If that's not the case, this has to be fixed on our side and I'll look into this
All right, thanks a lot! To further clarify: I think (please confirm) all-purpose clusters can still be used for jobs.
In that case, I'd expect any library configured on the job's tasks to override the default cluster libraries (which I think is the current behaviour if you attach libraries to a cluster policy) 🤔
I think I might have misunderstood original issue. In any case, even if you use interactive cluster, you can use it in the job tasks. But for libraries to be installed, you need to specify them at libraries
section in tasks not in clusters
so it could look like
resources:
clusters:
test_cluster:
cluster_name: "test-cluste"
spark_version: "13.3.x-snapshot-scala2.12"
num_workers: 1
data_security_mode: USER_ISOLATION
jobs:
some_other_job:
name: "[${bundle.target}] Test Wheel Job"
tasks:
- task_key: TestTask
existing_cluster_id: "${resources.clusters.test_cluster.cluster_id}"
python_wheel_task:
package_name: my_test_code
entry_point: run
parameters:
- "one"
- "two"
libraries:
- whl: ./dist/*.whl
Exactly. In my case I don't have any jobs attached to the cluster, so I can't use the setup you provided
Hello @andrewnester, any news about this? 🙏 LMK if I can help in any way!
Describe the issue
Since
0.229.0
all-purpose (interactive) clusters can be created via DAB.With Job clusters, it's pretty straightforward to install a DAB wheel artifact by specifying the
libraries
for a task executed on that cluster.With All-purpose clusters this is currently not possible, and the only solution is to perform post-operations with the SDK or APIs to add a library programmatically.
Configuration
Expected Behavior
There should be a way to specify the deployed bundle wheel as a dependency.
Actual Behavior
There's currently no way to specify this behaviour. The wheel needs to be post-attached to the cluster via the SDK by:
Note that both steps would greatly benefit from the substitution happening inside DABs - without it, the cluster name and library path have to be inferred somehow.
OS and CLI version
Is this a regression?
No, this is a new feature request
Debug Logs
N/A