databricks / cli

Databricks CLI
Other
150 stars 56 forks source link

Libraries requirements not replaced by workspace path #1517

Open flvndh opened 5 months ago

flvndh commented 5 months ago

Describe the issue

I use DAB to deploy a Python task. I specify my Python dependencies in a requirements.txt file. When I deploy the bundle, the path to the requirements file is not replaced by its workspace counterpart.

Configuration

Here is my job specification:

# yaml-language-server: $schema=../../bundle_config_schema.json

resources:
  jobs:
    ingest:
      name: ingest

      tasks:
        - task_key: ingest
          job_cluster_key: single_node_cluster
          libraries:
            - requirements: ../../requirements.txt
          spark_python_task:
            python_file: ../../runner.py
            parameters:
              - "--execution-time"
              - "{{ job.trigger.time.iso_datetime }}"
              - "--environment"
              - "${bundle.target}"

      job_clusters:
        - job_cluster_key: single_node_cluster
          new_cluster:
            spark_version: 14.3.x-scala2.12
            node_type_id: Standard_DS3_v2
            num_workers: 0
            spark_conf:
              spark.master: local[*, 4]
              spark.databricks.cluster.profile: singleNode
            custom_tags:
              ResourceClass: SingleNode
            azure_attributes:
              first_on_demand: 1
              availability: SPOT_WITH_FALLBACK_AZURE
              spot_bid_max_price: -1

Steps to reproduce the behavior

  1. Run databricks bundle deploy
  2. See the dependent libraries path in the Jobs UI

Expected Behavior

The dependent libraries path should be

/Users/<username>/.bundle/<bundle name>/<target>/files/requirements.py

Actual Behavior

The dependent libraries path is

../../requirements.txt

OS and CLI version

OS: Debian 12 CLI version: v0.221.1

pietern commented 5 months ago

Thanks for reporting. This field was recently added and we haven't added the path rewriting logic.

We'll work on a fix. In the meantime you can manually perform the interpolation to unblock yourself with:

      tasks:
        - task_key: ingest
          job_cluster_key: single_node_cluster
          libraries:
            - requirements: /Workspace/${workspace.file_path}/requirements.txt
          spark_python_task:
            python_file: ../../runner.py
            parameters:
              - "--execution-time"
              - "{{ job.trigger.time.iso_datetime }}"
              - "--environment"
              - "${bundle.target}"
dinjazelena commented 5 months ago

Hey, same thing for library path when using for_each_task feature. image

witi83 commented 4 months ago

@pietern This might be also needed: https://github.com/databricks/cli/pull/1543