databricks / terraform-provider-databricks

Databricks Terraform Provider
https://registry.terraform.io/providers/databricks/databricks/latest
Other
456 stars 393 forks source link

[ISSUE] Issue with `databricks_cluster` and `databricks_library` resource - the libraries are not installed #4096

Open Pirognoe opened 1 month ago

Pirognoe commented 1 month ago

Configuration

We have a list of libraries defined like that

databricks_libraries_cluster_247 = [  "confluent-kafka==2.5.3"
                                     , "opencensus-ext-azure==1.1.13"   
                                     , "pymsteams==0.2.2"
                                     , "zeep==4.1.0"
                                     , "pandas==1.3.4"
                                     , "lxml==4.9.1"
                                     , "pyarrow==17.0.0"
                                     , "openpyxl==3.1.5"
                                     , "xmltodict==0.13.0"
                                     , "pandarallel==1.6.5"
                                     , "mgzip==0.2.1"
                                     , "PureCloudPlatformClientV2==212.0.0"
                                     , "azure-storage-queue==12.11.0"
                                     , "parameterized==0.9.0"
                                     , "pyyaml==6.0.2"
                                     , "Jinja2==3.1.4"
                                     ]

and later we define the library resource like that

resource "databricks_library" "stream_hub_dh2_library" {
 for_each = toset(var.databricks_libraries_cluster_247)

 cluster_id = databricks_cluster.stream_hub_dh2.id
 pypi {
   package = each.key
 }
}

and the cluster to use those libraries

resource "databricks_cluster" "stream_hub_dh2" {
  cluster_name            = "stream_hub_dh2"
  spark_version           = data.databricks_spark_version.latest_lts.id
  node_type_id            = data.databricks_node_type.stream_worker_16Gb.id
  driver_node_type_id     = data.databricks_node_type.stream_driver_28Gb.id
  autotermination_minutes = 10
  no_wait                 = true
  autoscale {
    min_workers = 1
    max_workers = 4
  }

  spark_conf = {
    "spark.streaming.stopGracefullyOnShutdown"         : "true"
    "spark.sql.session.timeZone"                       : "UTC"
    "spark.sql.parquet.mergeSchema"                    : "true"
    "spark.hadoop.parquet.enable.summary-metadata"     : "false"
    "spark.sql.streaming.fileSink.log.compactInterval" : "999999"
    "spark.sql.sources.partitionOverwriteMode"         : "dynamic"
  }
  azure_attributes {
    availability       = "SPOT_WITH_FALLBACK_AZURE"
    first_on_demand    = 1
    spot_bid_max_price = -1
  }
}

Expected Behavior

The cluster should be created ( this happens ) and ALL libraries are assigned to it ( even if not installed immediately)

Actual Behavior

│ Error: cannot create library: Cluster is in unexpected state Pending. │ │ with databricks_library.stream_hub_dh2_library["pyyaml==6.0.2"], │ on main.tf line 140, in resource "databricks_library" "stream_hub_dh2_library": │ 140: resource "databricks_library" "stream_hub_dh2_library" {

1) no_wait did not seem to work properly 2) To me it looks like if autotermination_minutes is not long enough for all libs to install ->

P.S. Another issue is when you bump up the lib version and run tf apply - you usually end up with installation error ( cluster is displayed with 2 versions of the same lib, newer being red)

Steps to Reproduce

Terraform and provider versions

terraform {
  required_version = "~>1.7"
  required_providers {
     databricks = {
      source  = "databricks/databricks"
      version = "1.53.0"
    }
}

Is it a regression?

Debug Output

Important Factoids

Would you like to implement a fix?

alexott commented 1 month ago

Please remove no_wait - it was created for a specific case. Libraries could be installed only on running or terminated cluster

Pirognoe commented 1 month ago

P.S. Can you elaborate on what is the specific case for that? And maybe include it in docs? Anyway with or without no_wait the results seems to be the same - failure to install all the libraries and failure for TF apply