databricks / terraform-provider-databricks

Databricks Terraform Provider
https://registry.terraform.io/providers/databricks/databricks/latest
Other
446 stars 385 forks source link

[ISSUE] apply makes immaterial changes to resources, even though HCL file nor resource has changed. #3560

Open vcolano opened 4 months ago

vcolano commented 4 months ago

I have a databricks_job resource and related databricks_permission resource defined in the following configuration. After applying the initial changes the first time in terraform apply I ran a terraform plan and to my surprise, there were changes listed even though the configuration nor the actual resources themselves changed at all. I can keep running terraform apply on this configuration over and over and I the same changes listed each time.

Configuration

variable "jobs_runner_service_principal_application_id" {
  description = "Application ID of the 'prod-jobs-runner' service principal."
  type        = string
  nullable    = false
}

resource "databricks_job" "model_orchestrator" {
  name = "prod-model-orchestrator"
  run_as {
    service_principal_name = var.jobs_runner_service_principal_application_id
  }
  webhook_notifications {
    on_failure {
      id = "273c8641-4f21-4438-9d96-03d4a806f24b"
    }
  }
  task {
    task_key = "prod-model-preprocess"
    run_if   = "ALL_SUCCESS"
    notebook_task {
      source        = "GIT"
      notebook_path = "services/model/preprocess"
    }
    library {
      pypi {
        package = "pydantic==2.5.2"
      }
    }
    library {
      pypi {
        package = "databricks-sdk==0.22.0"
      }
    }
    job_cluster_key = "prod-model-preprocess_cluster"
  }
  task {
    task_key = "prod-model-polling"
    run_if   = "ALL_SUCCESS"
    notebook_task {
      source        = "GIT"
      notebook_path = "services/model/polling"
    }
    library {
      pypi {
        package = "databricks-sdk==0.22.0"
      }
    }
    library {
      pypi {
        package = "pydantic==2.5.2"
      }
    }
    job_cluster_key = "prod-model-polling_cluster"
    depends_on {
      task_key = "prod-model-preprocess"
    }
  }
  task {
    task_key = "prod-model-combine"
    run_if   = "ALL_SUCCESS"
    notebook_task {
      source        = "GIT"
      notebook_path = "services/model/combine"
    }
    library {
      pypi {
        package = "pydantic==2.5.2"
      }
    }
    library {
      pypi {
        package = "databricks-sdk==0.22.0"
      }
    }
    job_cluster_key = "prod-model-combine_cluster"
    depends_on {
      task_key = "prod-model-polling"
    }
  }
  max_concurrent_runs = 100
  job_cluster {
    job_cluster_key = "prod-model-preprocess_cluster"
    new_cluster {
      spark_version = "12.2.x-scala2.12"
      spark_conf = {
        "spark.databricks.adaptive.autoOptimizeShuffle.enabled" = "true"
        "spark.databricks.cluster.profile"                      = "singleNode"
        "spark.master"                                          = "local[*, 4]"
      }
      runtime_engine = "STANDARD"
      policy_id      = "9C6308F7030051E9"
      # i3.2xlarge has 8 vCPU and 61 GB RAM
      node_type_id        = "i3.2xlarge"
      enable_elastic_disk = false
      data_security_mode  = "SINGLE_USER"
      custom_tags = {
        ResourceClass = "SingleNode"
      }
      aws_attributes {
        zone_id                = "auto"
        spot_bid_price_percent = 100
        first_on_demand        = 1
        availability           = "SPOT_WITH_FALLBACK"
      }
    }
  }
  job_cluster {
    job_cluster_key = "prod-model-polling_cluster"
    new_cluster {
      spark_version = "12.2.x-scala2.12"
      spark_conf = {
        "spark.databricks.cluster.profile" = "singleNode"
        "spark.master"                     = "local[*, 4]"
      }
      runtime_engine = "STANDARD"
      # m5d.large has 2 vCPU and 8 GB RAM
      node_type_id        = "m5d.large"
      enable_elastic_disk = false
      data_security_mode  = "SINGLE_USER"
      custom_tags = {
        ResourceClass = "SingleNode"
      }
      aws_attributes {
        zone_id                = "us-east-1f"
        spot_bid_price_percent = 100
        first_on_demand        = 1
        availability           = "SPOT_WITH_FALLBACK"
      }
    }
  }
  job_cluster {
    job_cluster_key = "prod-model-combine_cluster"
    new_cluster {
      spark_version = "12.2.x-scala2.12"
      spark_conf = {
        "spark.databricks.adaptive.autoOptimizeShuffle.enabled" = "true"
        "spark.databricks.cluster.profile"                      = "singleNode"
        "spark.master"                                          = "local[*, 4]"
      }
      runtime_engine = "STANDARD"
      policy_id      = "9C6308F7030051E9"
      # i3.2xlarge has 8 vCPU and 61 GB RAM
      node_type_id        = "i3.2xlarge"
      data_security_mode  = "SINGLE_USER"
      custom_tags = {
        ResourceClass = "SingleNode"
      }
      aws_attributes {
        zone_id                = "auto"
        spot_bid_price_percent = 100
        first_on_demand        = 1
        availability           = "SPOT_WITH_FALLBACK"
      }
    }
  }
  git_source {
    url      = "https://github.com/myorg/myrepo/"
    provider = "gitHub"
    branch   = "main"
  }
}

resource "databricks_permissions" "model_orchestrator" {
  job_id = databricks_job.model_orchestrator.id
  access_control {
    service_principal_name = var.jobs_runner_service_principal_application_id
    permission_level       = "IS_OWNER"
  }
  access_control {
    permission_level = "CAN_MANAGE_RUN"
    group_name       = "infrastructure"
  }
  access_control {
    permission_level = "CAN_VIEW"
    group_name       = "users"
  }
}

Expected Behavior

After the first apply, when I try to apply again there should be no changes since the resources nor the configuration which defines them has changed.

Actual Behavior

Each time I run terraform apply, I get the same output:

databricks_service_principal.prod_platform_jobs_runner: Refreshing state... [id=5930489811099157]
module.prod_model.databricks_job.model_orchestrator: Refreshing state... [id=501652838178442]
module.prod_model.databricks_permissions.model_orchestrator: Refreshing state... [id=/jobs/501652838178442]

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following
symbols:
  ~ update in-place

Terraform will perform the following actions:

  # module.prod_model.databricks_job.model_orchestrator will be updated in-place
  ~ resource "databricks_job" "model_orchestrator" {
        id                        = "501652838178442"
        name                      = "prod-model-orchestrator"
        # (9 unchanged attributes hidden)

      ~ job_cluster {
            # (1 unchanged attribute hidden)

          ~ new_cluster {
              ~ enable_elastic_disk          = true -> false
                # (14 unchanged attributes hidden)

                # (1 unchanged block hidden)
            }
        }

      ~ task {
          ~ job_cluster_key           = "prod-model-combine_cluster" -> "prod-model-preprocess_cluster"
          ~ task_key                  = "prod-model-combine" -> "prod-model-preprocess"
            # (5 unchanged attributes hidden)

          - depends_on {
              - task_key = "prod-model-polling" -> null
            }

          ~ notebook_task {
              ~ notebook_path   = "services/model/combine" -> "services/model/preprocess"
                # (2 unchanged attributes hidden)
            }

            # (5 unchanged blocks hidden)
        }
      ~ task {
          ~ job_cluster_key           = "prod-model-preprocess_cluster" -> "prod-model-combine_cluster"
          ~ task_key                  = "prod-model-preprocess" -> "prod-model-combine"
            # (5 unchanged attributes hidden)

          + depends_on {
              + task_key = "prod-model-polling"
            }

          ~ notebook_task {
              ~ notebook_path   = "services/model/preprocess" -> "services/model/combine"
                # (2 unchanged attributes hidden)
            }

            # (5 unchanged blocks hidden)
        }

        # (7 unchanged blocks hidden)
    }

  # module.test_model.databricks_permissions.model_orchestrator will be updated in-place
  ~ resource "databricks_permissions" "model_orchestrator" {
        id          = "/jobs/939182301413075"
        # (2 unchanged attributes hidden)

      - access_control {
          - group_name       = "infrastructure" -> null
          - permission_level = "CAN_MANAGE_RUN" -> null
        }
      - access_control {
          - group_name       = "users" -> null
          - permission_level = "CAN_VIEW" -> null
        }
      + access_control {
          + permission_level       = "IS_OWNER"
          + service_principal_name = "24b3d6ce-fe33-484d-92b4-0484841a38"
        }
      + access_control {
          + group_name       = "infrastructure"
          + permission_level = "CAN_MANAGE_RUN"
        }
      + access_control {
          + group_name       = "users"
          + permission_level = "CAN_VIEW"
        }
    }

Plan: 0 to add, 2 to change, 0 to destroy.

Steps to Reproduce

terraform apply multiple times

Terraform and provider versions

❯ terraform --version                                                                                                                                                                                                                          
Terraform v1.5.7
on darwin_arm64

databricks provider version = "1.42.0"

Is it a regression?

No.

Important Factoids

The task ordering issue seems related to https://discuss.hashicorp.com/t/dynamic-task-foreach-order-changes/48699/7 As for the permissions, I do not understand why it removes/re-adds the same access_control blocks each time...

Shuhua-Wang-OV commented 4 months ago

I has the same issue with you. In databricks_job resource, it always shows changes to the task when using terraform plan even I didn't change anything in the config file.

anavrotski commented 3 months ago

I have the same issue with databricks_permissions for the SQL endpoint. databricks provider version = "1.41.0"

mgyucht commented 1 week ago

I believe the issue with databricks_permissions diff may be related to the fact that this resource currently removes the IS_OWNER permission from the state, so Terraform always plans to add the IS_OWNER block. At the same time, we don't want to add IS_OWNER to the state when the user hasn't specified it, otherwise there will be a diff showing IS_OWNER being removed. I have a PR that might address this by including IS_OWNER in the state when the user explicitly has included it in their configuration, otherwise leaving it out: https://github.com/databricks/terraform-provider-databricks/pull/3956.

junwei-db commented 1 week ago

We are also seeing the same issue when deploying databricks_job and databricks_permissions resources.

For databricks_job, another parameter that keeps showing as diff in every TF deployment is job-level timeout setting:

 # databricks_job.routing_job_node_types will be updated in-place
  ~ resource "databricks_job" "routing_job_node_types" {
        always_running            = false
        control_run_state         = false
        format                    = "MULTI_TASK"
        id                        = "491136304029932"
        max_concurrent_runs       = 1
        max_retries               = 0
        min_retry_interval_millis = 0
        name                      = "Routing Job for node_types (routing_job_node_types)"
        retry_on_timeout          = false
        tags                      = {
            "clusterIdentifier" = "routing_job"
            "clusterOwner"      = "eng-foresight-team"
        }
      ~ timeout_seconds           = 0 -> 18000
      ....
}

timeout_seconds is the only diff for this resource, and shows up every time even though it's already been applied previously.