databricks / terraform-provider-databricks

Databricks Terraform Provider
https://registry.terraform.io/providers/databricks/databricks/latest
Other
428 stars 368 forks source link

[ISSUE] `databricks_file` resource does not store md5 in state #3289

Open terrymunro opened 5 months ago

terrymunro commented 5 months ago

Configuration

terraform {
  required_providers {
    databricks = {
      source = "databricks/databricks"
      version = "1.37.0"
    }
  }
}

provider "databricks" {}

data "databricks_current_user" "me" {}

resource "databricks_file" "example" {
  source = "${path.module}/hello.sh"
  # Assuming this volume already exists.
  path = "/Volume/example/default/hello.sh"
}

resource "databricks_workspace_file" "example" {
  source = "${path.module}/hello.sh"
  path = "${data.databricks_current_user.me.home}/hello.sh"
}

Expected Behavior

The md5 attribute should be populated with the md5 hash of the source file for both resources.

Actual Behavior

It is only populated for databricks_workspace_file.

On the first apply both md5 attribute show as the default "different".

Changing the file and re-applying, the databricks_workspace_file shows a change on the md5 attribute from the md5 of the file to "different", but databricks_file does not detect any change.

In both resources it's logging the md5 correctly, but only databricks_workspace_file is saving the md5 hash to state. It shows in the diff as "different".

Steps to Reproduce

  1. Create any example file, echo '#!/bin/bash\n\necho "Hello World"' > hello.sh
  2. Update the example paths as needed / create a Volume
  3. terraform apply

Terraform and provider versions

Terraform v1.7.4 on linux_amd64

  • provider registry.terraform.io/databricks/databricks v1.37.0

Is it a regression?

No. databricks_file resource was only released in the latest (v1.37.0) version.

Debug Output

2024-02-22T18:19:51.712+1000 [DEBUG] refresh: databricks_file.example: no state, so not refreshing data.databricks_current_user.me: Reading...
2024-02-22T18:19:51.714+1000 [INFO]  provider.terraform-provider-databricks_v1.37.0: Reading /home/terry/example/hello.sh: timestamp="2024-02-22T18:19:51.714+1000"
2024-02-22T18:19:51.714+1000 [INFO]  provider.terraform-provider-databricks_v1.37.0: Setting file content hash to 9eb1ab7f9045cd04748e5798ea9e0cb6: timestamp="2024-02-22T18:19:51.714+1000"
2024-02-22T18:19:51.714+1000 [INFO]  provider.terraform-provider-databricks_v1.37.0: Suppressing  diff: false: timestamp="2024-02-22T18:19:51.714+1000"
2024-02-22T18:19:51.715+1000 [WARN]  Provider "registry.terraform.io/databricks/databricks" produced an invalid plan for databricks_file.example, but we are tolerating it because it is using the legacy plugin SDK.

2024-02-22T18:19:52.197+1000 [DEBUG] refresh: databricks_workspace_file.example: no state, so not refreshing
2024-02-22T18:19:52.201+1000 [INFO]  provider.terraform-provider-databricks_v1.37.0: Reading /home/terry/example/hello.sh: timestamp="2024-02-22T18:19:52.201+1000"
2024-02-22T18:19:52.201+1000 [INFO]  provider.terraform-provider-databricks_v1.37.0: Setting file content hash to 9eb1ab7f9045cd04748e5798ea9e0cb6: timestamp="2024-02-22T18:19:52.201+1000"
2024-02-22T18:19:52.201+1000 [INFO]  provider.terraform-provider-databricks_v1.37.0: Suppressing  diff: false: timestamp="2024-02-22T18:19:52.201+1000"
2024-02-22T18:19:52.203+1000 [WARN]  Provider "registry.terraform.io/databricks/databricks" produced an invalid plan for databricks_workspace_file.example, but we are tolerating it because it is using the legacy plugin SDK.
path=.terraform/providers/registry.terraform.io/databricks/databricks/1.37.0/linux_amd64/terraform-provider-databricks_v1.37.0 pid=2756954
  # databricks_file.example will be created
  + resource "databricks_file" "example" {
      + file_size         = (known after apply)
      + id                = (known after apply)
      + md5               = "different"
      + modification_time = (known after apply)
      + path              = "/Volumes/example/default/hello.sh"
      + source            = "/home/terry/example/hello.sh"
    }

  # databricks_workspace_file.example will be created
  + resource "databricks_workspace_file" "example" {
      + id             = (known after apply)
      + md5            = "different"
      + object_id      = (known after apply)
      + path           = "/Users/terry.munro@mantelgroup.com.au/hello.sh"
      + source         = "/home/terry/example/hello.sh"
      + url            = (known after apply)
      + workspace_path = (known after apply)
    }

Plan: 2 to add, 0 to change, 0 to destroy.

After changing the file and re-apply:

  # databricks_workspace_file.example will be updated in-place
  ~ resource "databricks_workspace_file" "example" {
        id             = "/Users/terry.munro@mantelgroup.com.au/hello.sh"
      ~ md5            = "9eb1ab7f9045cd04748e5798ea9e0cb6" -> "different"
        # (5 unchanged attributes hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

Important Factoids

For now, you can get around this by manually setting the md5 attribute, but the attribute is not documented:

locals {
  file = "{path.module}/hello.sh"
}

resource "databricks_file" "example" {
  source = local.file
  md5 = filemd5(local.file)
  path = "/Volumes/example/default/hello.sh"
}
tanmay-db commented 5 months ago

Hi @terrymunro, thanks for reaching out. We will take a look.

Chocanto commented 2 months ago

We are also having this issue. databricks_file does not detect when the source file has changed.

Thank you for your help on this issue :)

tsndqst commented 1 month ago

It looks like this might be fixed with https://github.com/databricks/terraform-provider-databricks/pull/3662

This is also an issue with the databricks_global_init_script resource but I don't think the above PR will fix the init script resource. The md5 attribute trick mentioned above works around the issue though.