Add documentation on choosing a deployment template and how that choice impacts the expected provider usage.

davidmaceachern commented 1 year ago

Our current docs link through to the complete list of deployment templates available in ESS. This is pretty overwhelming, and also potentially misleading to new customers. For example, the old dedicated hot-warm, cross cluster search, and enterprise search templates are listed there, which have been made completely redundant by the current templates. We should either improve the existing docs to mark old templates as deprecated, or develop simplified, provider specific docs helping customers choose a template.

Original issue

Readiness Checklist

[x] I am running the latest version
[x] I checked the documentation and found no answer
[x] I checked to make sure that this issue has not already been filed
[x] I am reporting the issue to the correct repository (for multi-repository projects)

Expected Behavior

When running terraform apply to remove a data tier, if there is a problem with removing the data tier, the resulting error should indicate why the problem occurred.

Current Behavior

Expanded on in definition section below, terraform plan and apply is being run in the following docker image in a gitlab ci/cd https://hub.docker.com/layers/hashicorp/terraform/1.2.7/images/sha256-8e4d010fc675dbae1eb6eee07b8fb4895b04d144152d2ef5ad39724857857ccb?context=explore

Terraform definition

Trying to implement this by wrapping the provider in a tf module, have omitted all those details for brevity.

First encountered the error on provider 0.7.0 and tf 1.2.0, here are the provider details we are using in most recent attempt in module's provider.tf:

terraform {
  required_version = ">= 1.2.7"

  required_providers {
    ec = {
      source  = "elastic/ec"
      version = "~>0.8.0"
    }
  }
}

First a cluster was created with the following configuration in elastic.tf, a warm tier was never added to the cluster explicitly.

module "elastic" {
  deployment_template_id = "aws-hot-warm-v3"
  hot_tier_topology = {
    zone_count    = 1
    size          = "1g"
    size_resource = "memory"
    autoscaling   = {}
  }
}

The resource configuration we wish to apply, in elastic.tf:

module "elastic" {
  deployment_template_id = "aws-storage-optimized-dense-v4"

  autoscale = true

  master_topology = {
    zone_count    = 3
    size          = "4g"
    size_resource = "memory"
    autoscaling = {
      max_size          = "4g"
      max_size_resource = "memory"
    }
  }

  hot_tier_topology = {
    zone_count    = 3
    size          = "1g"
    size_resource = "memory"
    autoscaling = {
      max_size          = "1g"
      max_size_resource = "memory"
    }
  }
  cold_tier_topology = {
    zone_count    = 2
    size          = "2g"
    size_resource = "memory"
    autoscaling = {
      max_size          = "2g"
      max_size_resource = "memory"
    }
  }
  frozen_tier_topology = {
    zone_count    = 2
    size          = "4g"
    size_resource = "memory"
    autoscaling = {
      max_size          = "4g"
      max_size_resource = "memory"
    }
  }
}

Steps to Reproduce

terraform apply a cluster with hot tier and deployment template aws-hot-warm-v3
terraform apply the addition of some other tiers, without a warm tier
Encounter the following error when applying the change:

│ Error: Provider produced inconsistent result after apply
│ 
│ When applying changes to module.elastic[0].ec_deployment.default, provider
│ "provider[\"registry.terraform.io/elastic/ec\"]" produced an unexpected new
│ value: .elasticsearch.warm: was null, but now
│ cty.ObjectVal(map[string]cty.Value{"autoscaling":cty.ObjectVal(map[string]cty.Value{"max_size":cty.StringVal("116g"),
│ "max_size_resource":cty.StringVal("memory"),
│ "min_size":cty.NullVal(cty.String),
│ "min_size_resource":cty.NullVal(cty.String),
│ "policy_override_json":cty.NullVal(cty.String)}),
│ "instance_configuration_id":cty.StringVal("aws.data.highstorage.d2"),
│ "node_roles":cty.SetVal([]cty.Value{cty.StringVal("data_warm"),
│ cty.StringVal("remote_cluster_client")}),
│ "node_type_data":cty.NullVal(cty.String),
│ "node_type_ingest":cty.NullVal(cty.String),
│ "node_type_master":cty.NullVal(cty.String),
│ "node_type_ml":cty.NullVal(cty.String), "size":cty.StringVal("4g"),
│ "size_resource":cty.StringVal("memory"),
│ "zone_count":cty.NumberIntVal(2)}).
│ 
│ This is a bug in the provider, which should be reported in the provider's
│ own issue tracker.

terraform apply a deployment template which could be better suited to the cluster, e.g. aws-storage-optimized-dense-v4
encounter another different error

│ Error: Duplicate Set Element
│ 
│   with module.elastic[0].ec_deployment.default,
│   on .terraform/modules/elastic/main.tf line 20, in resource "ec_deployment" "default":
│   20: resource "ec_deployment" "default" {
│ 
│ This attribute contains duplicate values of:
│ tftypes.Object["name":tftypes.String, "type":tftypes.String,
│ "url":tftypes.String,
│ "version":tftypes.String]<"name":tftypes.String<"google-workspace">,
│ "type":tftypes.String<"bundle">, "url":tftypes.String<"repo://1142903475">,
│ "version":tftypes.String<"8.9.0">>

Context

Goal is to have a cluster completely defined using terraform. Expect that if it is not easy to iterate on cluster changes, then the documentation should indicate that it is best to configure a cluster manually through the dashboard before declaring in terraform.

For future cluster changes, it would also help to understand from the documentation what changes require manually testing via the dashboard, and what changes can be done via terraform.

It's expected that terraform state should not be corrupted and require manual intervention to fix, the provider should be able to successfully apply the changes or explain in documentation how to fix the issue.

Possible Solution

What has been tried so far:

bumping provider version
bumping terraform version

What has not been tried:

Manually changing the tf state to remove the warm tier https://support.hashicorp.com/hc/en-us/articles/1500006254562-Provider-Produced-Inconsistent-Results, can try this as workaround however yet to believe it addresses the root cause
finding a deployment template that handles hot/cold/frozen tiers
or defining the hardware explicitly with the data tiers we want and skipping defining the deployment template (can't find any good docs on how to do this but it does appear to be possible to customize a template

Your Environment

Version used: terraform 1.2.7, provider 0.8.0
Running against Elastic Cloud SaaS or Elastic Cloud Enterprise and version: tbc
Environment name and version (e.g. Go 1.9): n/a
Server type and version: n/a
Operating System and version: n/a
Link to your project: n/a

tobio commented 1 year ago

This sounds like the google-workspace bundle is either defined in the TF resource twice, or somehow getting duplicated when the provider is reading state from the API. I'd need to see the actual module code to help more on this one.

│ Error: Duplicate Set Element
│ 
│   with module.elastic[0].ec_deployment.default,
│   on .terraform/modules/elastic/main.tf line 20, in resource "ec_deployment" "default":
│   20: resource "ec_deployment" "default" {
│ 
│ This attribute contains duplicate values of:
│ tftypes.Object["name":tftypes.String, "type":tftypes.String,
│ "url":tftypes.String,
│ "version":tftypes.String]<"name":tftypes.String<"google-workspace">,
│ "type":tftypes.String<"bundle">, "url":tftypes.String<"repo://1142903475">,
│ "version":tftypes.String<"8.9.0">>

module "elastic" {
  deployment_template_id = "aws-hot-warm-v3"
  hot_tier_topology = {
    zone_count    = 1
    size          = "1g"
    size_resource = "memory"
    autoscaling   = {}
  }
}

You probably don't want to use the hot-warm template here. I think you've figured this out, but all data tiers are available on all the current templates. Something like aws-storage-optimized-dense-v4 will be better suited. There's probably some updates we can make to the Terraform docs around selecting a template.

IIRC the hot-warm has a default non-zero size for the warm tier which is causing the Provider produced inconsistent result error. If you want to use the hot-warm template (you almost certainly don't), then you'd need to define a warm tier. We might be able to do improve how the provider handles this, but that would likely just be a nicer error message (e.g "Using the x template requires defining y data tier") since resolving the inconsistent result may be somewhat complex.

davidmaceachern commented 1 year ago

Thanks for the fast response @tobio !

This sounds like the google-workspace bundle is either defined in the TF resource twice <...>

Great spot sorry for the brief example I provided, we figured this out yesterday that it was indeed defined twice! Have a workaround to remove it and fix will be to move it outside of the module.

There's probably some updates we can make to the Terraform docs around selecting a template.

Is there a schema somewhere that declares what is going to be a non-zero configuration?

I've had a dig into the source code and searched the docs however the only guidance I have right now is the list in the official Elastic docs https://www.elastic.co/guide/en/cloud/current/ec-regions-templates-instances.html

I understand it is also possible from running a terraform plan for a new cluster however I don't have a fast way of doing right now.

davidmaceachern commented 1 year ago

Update on the data tier blocker, we managed to manually remove the warm tier from ES dashboard, and remove the Google workspace plugin, then rerunning terraform plan and terraform apply allow us to successfully switch the deployment template.

tobio commented 1 year ago

Is there a schema somewhere that declares what is going to be a non-zero configuration?

The defaults are defined in the deployment template. You can get the template definition via the api, the default plan for that template is defined in the deployment_template property.

elastic / terraform-provider-ec