databricks / cli

Databricks CLI
Other
128 stars 48 forks source link

auto:latest-lts spark_version support #1327

Open YaroBear opened 5 months ago

YaroBear commented 5 months ago

Describe the issue

I am trying to deploy a DAB that creates a new job cluster using the policy id for the Job Compute policy. The Job Compute policy sets this value for the spark_version:

"spark_version": {
    "type": "unlimited",
    "defaultValue": "auto:latest-lts"
  },

I want to use the latest-lts spark version for my jobs if possible and not have to specify the exact version in the DAB.

Setting spark_version: "auto:latest-lts" in my DAB does not work and I get the following error: "INVALID_PARAMETER_VALUE: Invalid spark version auto:latest-lts." I would expect the resulting bundle.tf.json that has a line that looks similar to data.databricks_spark_version.latest.id, using the databrick_spark_version Terraform resource in order for this to work correctly.

Omitting the spark_version in my DAB, produces a bundle.tf.json with an empty string: "spark_version": "", and I get a similar error: "INVALID_PARAMETER_VALUE: Invalid spark version ."

Are there plans for the CLI to support this use case?

Configuration

bundle.yml:

bundle:
  name: Test

sync:
  include:
      - src/*.py

variables:
  cluster_policy_id:
    description: "The cluster policy used to create the cluster for the job."

resources:
  jobs:
    Test:
      name: _Test
      job_clusters:
        - job_cluster_key: test-cluster
          new_cluster:
            policy_id: ${var.cluster_policy_id}
            apply_policy_default_values: true
            node_type_id: Standard_D8ads_v5
            num_workers: 4
            spark_version: "auto:latest-lts"
      tasks:
        - task_key: _Test
          job_cluster_key: test-cluster
          notebook_task:
            notebook_path: "./src/test.py"

Steps to reproduce the behavior

Please list the steps required to reproduce the issue, for example:

  1. Run databricks bundle deploy --var "cluster_policy_id=<job compute policy id>"
  2. See error ""INVALID_PARAMETER_VALUE: Invalid spark version auto:latest-lts."

Expected Behavior

DAB should deploy to Databricks using the LTS spark version.

Actual Behavior

Clear and concise description of what actually happened

OS and CLI version

Windows 10 Databricks CLI v0.211.0

Is this a regression?

No

Debug Logs

andrewnester commented 5 months ago

JFYI This has to be addressed in Go SDK / API definition where spark version is defined as always required field https://github.com/databricks/databricks-sdk-go/blob/a823ca32fc4199d8cf2269b78cfe89331b4b688a/service/compute/model.go#L1544-L1547

cc @mgyucht