Open bgrams opened 2 months ago
@bgrams 2nd item is explicitly documented: https://registry.terraform.io/providers/databricks/databricks/latest/docs/resources/cluster#policy_id
Thanks for sharing this, @alexott. Perhaps I could have filed that one as an enhancement rather than an issue - both are included here given that they are likely addressed by the same fix. If you prefer that I create a different issue to track these separately then I can do that.
Re. drift - is this an explicit design decision or do we agree that this is something that could/should be improved? Consider the following use case for cluster policies from the Databricks documentation:
Simplify the user interface and enable more users to create their own clusters
The API does not require that policy configuration defaults also be provided in the request, and having this requirement for the provider alone is a complicating factor for the end user. Therefore I imagine this would be a useful and reasonable improvement, and I don't believe the implementation would be particularly challenging. However I acknowledge that there is likely some relevant historical context that I am missing.
What do you think?
Terraform itself expects fields to be set explicitly - if new value comes from the backend, then the field should be marked as computed, otherwise TF will fail with an error. But marking all fields computed isn't feasible and lead to errors, like, not being able to remove driver node type, etc.
I also just tried with th latest version from repo, and I don't seen any drift with apply_policy_default_values = true
as soon as I provide all necessary details.
Default values don't necessarily need to be computed from the backend - we know the policy at the time of cluster creation and can predict the API behavior.
as soon as I provide all necessary details.
To clarify, this is the complicating factor that I think could be improved. We have many TF-managed clusters across dozens of repos. Keeping all of these up to date whenever there is a policy change is tedious and shouldn't be necessary as it is not required by the Clusters API.
A solution may be to snapshot the policy and store it in state then use this within a CustomizeDiffFunc
to suppress an expected diff. I'll see if I can come up with a working example to share, but happy to entertain other ideas as well.
Here's an example function that correctly suppresses drift when added as a schema.DiffSuppressfunc
:
func appliedPolicyDiffSuppressFunc(k, old, new string, d *schema.ResourceData) bool {
// When a policy_id is provided during cluster creation, we snapshot the policy and
// merge the definition with policy_family_definition_overrides. This is then serialized to json
// and the merged result is persisted at this address
policyString, ok := d.GetOk("applied_policy")
if !ok {
return false
}
var appliedPolicy map[string]map[string]interface{}
err := json.Unmarshal([]byte(policyString.(string)), &appliedPolicy)
if err != nil {
return false
}
// Policy definition and resource attributes are keyed identically
// This couldn't possibly be more convenient
policy, ok := appliedPolicy[k]
if !ok {
return false
}
// Fixed attributes will always be applied
if policy["type"].(string) == "fixed" {
return new == "" && old == policy["value"].(string)
}
// Default attributes should only be considered when apply_policy_default_values is true
applyDefaults, ok := d.GetOk("apply_policy_default_values")
if ok && applyDefaults.(bool) {
if defaultValue, ok := policy["defaultValue"]; ok {
return new == "" && old == defaultValue.(string)
}
}
return false
}
Tested using spark_conf
but I think it generalizes for all cluster attributes.
Thoughts?
That's solving only a problem of diff. How would you signal TF that the policy has been updated on the other side? Right now you can build cluster configuration driven by the cluster policy definition using TF code itself, no provider modification required
Yes, that is precisely the problem that I am hoping to solve. The diff presented in the plan under these conditions is otherwise meaningless and incorrect as it has no effect on the backend result, so why do we need it?
The applied_policy
in the above solution can be recomputed upon cluster edit or when the policy_id
changes. There are some edge cases I can think of where this would not be a perfect solution (e.g. a sequence of several external policy updates and cluster restarts over time) but I'm not sure that capturing these is possible without additional API support. In those cases, plan behavior would match what exists today, so suppressing diff of "known" policy-defined attributes would still be a net improvement.
I think the only alternative would be to mark all fields as computed and build the full request in the provider using policy values, which I imagine is similar to the client logic in the workspace UI, but to your point that may not be a feasible approach (though maybe this is worth revisiting).
Hey @alexott I'd love if this capability would land in a future release - would you like me to open a PR or is there anything else we should consider before then?
The engineering team should discuss this. I think the proposed solution won't help with cases when policy is updated outside of the cluster definition and this will lead to more issues - we have quite a lot of edge cases already when the computed
is used (and we need it to specify that field is generated outside of Terraform).
In the meantime you can use cluster policy data source to populate cluster attributes from policy, like this: https://github.com/databricks/terraform-provider-databricks/issues/1787#issuecomment-1371886145
Cc: @mgyucht
Provider does not consider policy-provided configuration when a
databricks_cluster
is configured withapply_policy_default_values = true
. This results in 2 separate issues:num_workers
defaults, as is the case with all Databricks-provided policy families, the cluster validation procedure will fail due to it considering these attributes to contain their zero value if they are unset in the HCL.Configuration
Expected Behavior
Provider respects policy-provided attributes for
databricks_cluster
resource configuration.Actual Behavior
Provider does not respect policy-provided attributes for
databricks_cluster
resource configuration.Steps to Reproduce
terraform apply -target=databricks_cluster.this_will_fail_on_apply
- will failterraform apply -target=databricks_cluster.this_will_drift_forever
terraform plan -target=databricks_cluster.this_will_drift_forever
- will produce a diffTerraform and provider versions
Terraform 1.5.6 Provider 1.51.0
Is it a regression?
No
Debug Output
N/A
Important Factoids
No
Would you like to implement a fix?
Possibly