Resizing a deployment with terraform reset the default cloud snapshot policy to its defaults

Readiness Checklist

[ x ] I am running the latest version
[ x ] I checked the documentation and found no answer
[ x ] I checked to make sure that this issue has not already been filed
[ x ] I am reporting the issue to the correct repository (for multi-repository projects)

Expected Behavior

Given a manifest where I define both the size of the deployment (e.g. number of nodes for a given tier) and update the default SLM policy, I would expect the SLM policy to stay the same across terraform runs if I do not modify the SLM policy, unless I changed settings that are not meant to be changed.

Current Behavior

If I modify the number of nodes, the first run also (silently) resets the SLM policy to the cloud defaults. The second terraform apply will update the SLM policy as per the terraform definition.

Terraform definition

terraform {
  required_version = ">= 1.0.0"
  required_providers {
    ec = {
      source  = "elastic/ec"
    }
     elasticstack = {
      source  = "elastic/elasticstack"
   }
  }
}

provider "ec" {
apikey = "REDACTED"
}

resource "ec_deployment" "custom-deployment" {
  name                   = "My deployment identifier"
  region                 = "gcp-europe-west3"
  version                = "8.15.0"
  deployment_template_id = "gcp-memory-optimized-v2"

 elasticsearch = {
    hot = {
      size = "4g"
      zone_count="3"
      autoscaling = {}
    }
  }
  kibana = {}
}

provider "elasticstack" {
  elasticsearch {
    username = ec_deployment.custom-deployment-fc.elasticsearch_username
    password = ec_deployment.custom-deployment-fc.elasticsearch_password
  endpoints = ["${ec_deployment.custom-deployment-fc.elasticsearch.https_endpoint}"]
  }
}

resource "elasticstack_elasticsearch_snapshot_lifecycle" "cloud-snapshot-policy" {
  name = "cloud-snapshot-policy"
  schedule      = "0 0 1 * * ?"
  snapshot_name = "<cloud-snapshot-{now/d}>"
  repository    = "found-snapshots"
  include_global_state = true
  expire_after = "30d"
  min_count    = 5
  max_count    = 50
}

Steps to Reproduce

Using the manifest above, update the zone_count parameter up or down.
Run terraform apply twice
first run will return:

Terraform will perform the following actions:

  # ec_deployment.custom-deployment will be updated in-place
  ~ resource "ec_deployment" "custom-deployment" {
      ~ elasticsearch          = {
          ~ hot            = {
              ~ node_roles                       = [
                  - "data_content",
                  - "data_hot",
                  - "ingest",
                  - "master",
                  - "remote_cluster_client",
                  - "transform",
                ] -> (known after apply)
              ~ zone_count                       = 3 -> 2
                # (5 unchanged attributes hidden)
            }
            # (10 unchanged attributes hidden)
        }
        id                     = "REDACTED"
        name                   = "My deployment identifier"
        # (9 unchanged attributes hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

second run will return:

Terraform will perform the following actions:

  # elasticstack_elasticsearch_snapshot_lifecycle.cloud-snapshot-policy will be updated in-place
  ~ resource "elasticstack_elasticsearch_snapshot_lifecycle" "cloud-snapshot-policy" {
      ~ expire_after         = "259200s" -> "30d"
        id                   = "7DvRmusHQTSmaV8pvdQcGw/cloud-snapshot-policy"
      ~ max_count            = 100 -> 50
      ~ min_count            = 10 -> 5
        name                 = "cloud-snapshot-policy"
      ~ partial              = true -> false
      ~ schedule             = "0 */30 * * * ?" -> "0 0 1 * * ?"
        # (7 unchanged attributes hidden)
    }

Running the same manifests without specifying the deployment size, for instance:

 elasticsearch = {
    hot = {
      autoscaling = {}
    }

does not modify the policy regardless of any resizing operations in the cloud console UI, which point the issue towards cluster sizing.

Context

This can be a problem when configuring larger retention periods than the default, as this could cause the SLM policy to delete older snapshots before the change is picked up.

As the first change also resets the SLM policy silently, this could introduce changes that are not noticed until the next terraform manifests update, if it's noticed at all.

Possible Solution

A workaround is to either proceed with resize operations via the web interface, or to manually edit the SLM retention schedule to prevent the snapshots cleanup to happen in the middle of terraform changes.

Your Environment

Version used: elastic/elasticstack v0.11.4, elastic/ec v0.10.0
Running against Elastic Cloud SaaS
Environment name and version: go version go1.22.2 darwin/arm64

elastic / terraform-provider-ec