elastic / terraform-provider-ec

https://registry.terraform.io/providers/elastic/ec/latest/docs
Apache License 2.0
173 stars 86 forks source link

Race condition on creating Deployment with custom config that relies on keystore secret #433

Closed jadrol closed 12 months ago

jadrol commented 2 years ago

In scenario when you specify custom elasticsearch.yaml settings using elasticsearch.config.user_settings_yaml and settings provided there requires you to use keystore for storing secure parameters you end in race condition when creating new deployment / editing existing.

Readiness Checklist

Expected Behavior

Apply should work without failing.

Current Behavior

Example of configuration documentation: https://www.elastic.co/guide/en/cloud/current/ec-securing-clusters-oidc-op.html#ec-securing-oidc-okta (steps 2 & 3)

What is happening:

  1. Deployment is being created/updated with new elasticsearch.yaml settings
  2. Keystore secret depends on deployment so it is waiting for 1. to finish
  3. Elasticsearch cannot be restarted in step 1. which makes apply either hang (when creating deployment from scratch) or immediately fail (when applying to existing deployment).

 Terraform definition

resource "ec_deployment" "race-condition-cluster" {
  name = "race-condition-cluster"

  region                 = "eu-west-1"
  version                = "7.16.3"
  deployment_template_id = "aws-observability"

  elasticsearch {
    topology {
      id         = "hot_content"
      size       = "1g"
      zone_count = 1
    }

    config {
      user_settings_yaml = <<EOF
xpack.security.authc.realms.oidc.oidc1:
  order: 2
  rp.client_id: client_id
  rp.response_type: "code"
  rp.requested_scopes: ["openid", "email"]
  rp.redirect_uri: "https://kibana.example.com/api/security/oidc/callback"
  op.issuer: "https://sample.okta.com"
  op.authorization_endpoint: "https://sample.okta.com/oauth2/v1/authorize"
  op.token_endpoint: "https://sample.okta.com/oauth2/v1/token"
  op.userinfo_endpoint: "https://sample.okta.com/oauth2/v1/userinfo"
  op.endsession_endpoint: "https://sample.okta.com/oauth2/v1/logout"
  op.jwkset_path: "https://sample.okta.com/oauth2/v1/keys"
  claims.principal: email
  claim_patterns.principal: "^([^@]+)@sample\\.com$"
EOF
    }
  }

  kibana {
    topology {
      size       = "1g"
      zone_count = 1
    }

    config {
      user_settings_yaml = <<EOF
xpack.security.authc.providers:
  oidc.oidc1:
    order: 0
    realm: oidc1
    description: "Log in with Okta"
EOF
    }
  }
}

resource "ec_deployment_elasticsearch_keystore" "okta_secret" {
  deployment_id = ec_deployment.race-condition-cluster.id
  setting_name  = "xpack.security.authc.realms.oidc.oidc1.rp.client_secret"
  value         = "my_super_secret"
}

Steps to Reproduce

Apply sample terraform config provided above

Context

We are able to deal with it by simply temporarily commenting out custom settings and doing two applies to fully apply this feature. We would love to have it without manual operations to be done

Possible Solution

Allow to configure custom elasticsearch.yml entries via separate TF resource

Your Environment

andrewnazarov commented 2 years ago

We are facing the same issue. Would love to see this fixed. Besides this we also saw other use cases when a dedicated resource for user settings would make sense - in the original example, it might be the redirect_url or anything else that is known only after the Elasticsearch/Kibana instance is created.

mikeclearbank commented 2 years ago

+1 for the suggestion to have a separate terraform resource to configure a deployment's settings (both elastic and kibana).

waltrinehart commented 2 years ago

+1. This race condition doesn't allow me to set up SSO on cluster creation.

davidg-datascene commented 2 years ago

Same here. The race condition is preventing us from moving to the terraform-provider-ec. We were looking to migrate from using local-exec to execute the API call for the deployment and SSO config and use the provider instead.

waltrinehart commented 2 years ago

It would be nice if this could be addressed, our automation is currently slowed down by this.

saimantr commented 1 year ago

Any fix for this yet ? The newer version of ec provider (0.6) helps with resolving circular dependency of kibana URL and elasticsearch.yaml file, but however the deployments are getting failed due to dependency on keystore value. Below error was noticed - ''' [instance-0000000001] fatal exception while booting Elasticsearch java.lang.IllegalStateException: security initialization failed at org.elasticsearch.xpack.security.Security.createComponents(Security.java:578) ~[?:?] at org.elasticsearch.node.Node.lambda$new$16(Node.java:721) ~[elasticsearch-8.6.2.jar:?] at org.elasticsearch.plugins.PluginsService.lambda$flatMap$0(PluginsService.java:252) ~[elasticsearch-8.6.2.jar:?] at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:273) ~[?:?] at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) ~[?:?] at java.util.AbstractList$RandomAccessSpliterator.forEachRemaining(AbstractList.java:722) ~[?:?] at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) ~[?:?] at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) ~[?:?] at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:575) ~[?:?] at java.util.stream.AbstractPipeline.evaluateToArrayNode(AbstractPipeline.java:260) ~[?:?] at java.util.stream.ReferencePipeline.toArray(ReferencePipeline.java:616) ~[?:?] at java.util.stream.ReferencePipeline.toArray(ReferencePipeline.java:622) ~[?:?] at java.util.stream.ReferencePipeline.toList(ReferencePipeline.java:627) ~[?:?] at org.elasticsearch.node.Node.(Node.java:736) ~[elasticsearch-8.6.2.jar:?] at org.elasticsearch.node.Node.(Node.java:322) ~[elasticsearch-8.6.2.jar:?] at org.elasticsearch.bootstrap.Elasticsearch$2.(Elasticsearch.java:214) ~[elasticsearch-8.6.2.jar:?] at org.elasticsearch.bootstrap.Elasticsearch.initPhase3(Elasticsearch.java:214) ~[elasticsearch-8.6.2.jar:?] at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:67) ~[elasticsearch-8.6.2.jar:?] Caused by: org.elasticsearch.common.settings.SettingsException: The configuration setting [xpack.security.authc.realms.oidc.oidc1.rp.client_secret] is required at org.elasticsearch.xpack.security.authc.oidc.OpenIdConnectRealm.buildRelyingPartyConfiguration(OpenIdConnectRealm.java:256) ~[?:?] at org.elasticsearch.xpack.security.authc.oidc.OpenIdConnectRealm.(OpenIdConnectRealm.java:98) ~[?:?] at org.elasticsearch.xpack.security.authc.InternalRealms.lambda$getFactories$7(InternalRealms.java:169) ~[?:?] at org.elasticsearch.xpack.security.authc.Realms.initRealms(Realms.java:288) ~[?:?] at org.elasticsearch.xpack.security.authc.Realms.(Realms.java:109) ~[?:?] at org.elasticsearch.xpack.security.Security.createComponents(Security.java:686) ~[?:?] at org.elasticsearch.xpack.security.Security.createComponents(Security.java:566) ~[?:?] ... 17 more '''

tobio commented 1 year ago

@saimantr this is a limitation on the Elastic Cloud API at the moment. There's a request there to support setting keystore values during deployment creation, however there's no plans to work on that in the near future.

saimantr commented 1 year ago

@saimantr this is a limitation on the Elastic Cloud API at the moment. There's a request there to support setting keystore values during deployment creation, however there's no plans to work on that in the near future.

Thanks for the revert @tobio. So, my use case is to provision an elasticsearch cluster with SSO integration in one go using terraform. IS there any alternative way to achieve this or any thoughts ?

tommed commented 1 year ago

Another ignored ticket :-/ Holding up progress on https://github.com/elastic/terraform-provider-ec/issues/656

dimuon commented 1 year ago

@tommed , we're working on it. I'm sorry for the delay but we have to change both the provider and ECE API to properly handle that use case.

tommed commented 1 year ago

@dimuon Apologies for the comment, but we are finding that a lot of tickets relating to terraform (especially those in hashicorp's own repos) are being left to stagnate, even if the issue is a simple documentation error. It gets quite frustrating when you've convinced a team to move all DSC/IaC work into Terraform and then are stuck because of tickets raised months ago. Best of luck with this one!! Once we have a fix, we will be able to automate our entire service stack via Terraform Cloud - which will be amazing to have working again!

dimuon commented 12 months ago

Should be addressed by https://github.com/elastic/terraform-provider-ec/pull/674

dimuon commented 12 months ago

@tommed , the needed change in backend was deployed last week so we merged the corresponding PR for the provider and released version 0.9.0 that addresses the issue.

saimantr commented 11 months ago

Any fix for this yet ? The newer version of ec provider (0.6) helps with resolving circular dependency of kibana URL and elasticsearch.yaml file, but however the deployments are getting failed due to dependency on keystore value. Below error was noticed - ''' [instance-0000000001] fatal exception while booting Elasticsearch java.lang.IllegalStateException: security initialization failed at org.elasticsearch.xpack.security.Security.createComponents(Security.java:578) ~[?:?] at org.elasticsearch.node.Node.lambda$new$16(Node.java:721) ~[elasticsearch-8.6.2.jar:?] at org.elasticsearch.plugins.PluginsService.lambda$flatMap$0(PluginsService.java:252) ~[elasticsearch-8.6.2.jar:?] at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:273) ~[?:?] at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) ~[?:?] at java.util.AbstractList$RandomAccessSpliterator.forEachRemaining(AbstractList.java:722) ~[?:?] at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) ~[?:?] at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) ~[?:?] at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:575) ~[?:?] at java.util.stream.AbstractPipeline.evaluateToArrayNode(AbstractPipeline.java:260) ~[?:?] at java.util.stream.ReferencePipeline.toArray(ReferencePipeline.java:616) ~[?:?] at java.util.stream.ReferencePipeline.toArray(ReferencePipeline.java:622) ~[?:?] at java.util.stream.ReferencePipeline.toList(ReferencePipeline.java:627) ~[?:?] at org.elasticsearch.node.Node.(Node.java:736) ~[elasticsearch-8.6.2.jar:?] at org.elasticsearch.node.Node.(Node.java:322) ~[elasticsearch-8.6.2.jar:?] at org.elasticsearch.bootstrap.Elasticsearch$2.(Elasticsearch.java:214) ~[elasticsearch-8.6.2.jar:?] at org.elasticsearch.bootstrap.Elasticsearch.initPhase3(Elasticsearch.java:214) ~[elasticsearch-8.6.2.jar:?] at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:67) ~[elasticsearch-8.6.2.jar:?] Caused by: org.elasticsearch.common.settings.SettingsException: The configuration setting [xpack.security.authc.realms.oidc.oidc1.rp.client_secret] is required at org.elasticsearch.xpack.security.authc.oidc.OpenIdConnectRealm.buildRelyingPartyConfiguration(OpenIdConnectRealm.java:256) ~[?:?] at org.elasticsearch.xpack.security.authc.oidc.OpenIdConnectRealm.(OpenIdConnectRealm.java:98) ~[?:?] at org.elasticsearch.xpack.security.authc.InternalRealms.lambda$getFactories$7(InternalRealms.java:169) ~[?:?] at org.elasticsearch.xpack.security.authc.Realms.initRealms(Realms.java:288) ~[?:?] at org.elasticsearch.xpack.security.authc.Realms.(Realms.java:109) ~[?:?] at org.elasticsearch.xpack.security.Security.createComponents(Security.java:686) ~[?:?] at org.elasticsearch.xpack.security.Security.createComponents(Security.java:566) ~[?:?] ... 17 more '''

hey @dimuon @tobio ; I can still see the same logs and error while trying to configure OIDC for SSO login while creating the cluster. Any thoughts around this ?

dimuon commented 11 months ago

@saimantr , do you use 0.9.0 against Elastic cloud or ECE? Can you please share your TF config?

dimuon commented 11 months ago

If you're running the provider against ECE, please provide the ECE version.

saimantr commented 11 months ago

@dimuon : We are using Elasticsearch Service for this and yes set the version to 0.9.0. Terraform version is 1.4.6

terraform {
  required_version = ">= 0.13.1"
  required_providers {
    random = {
      source  = "hashicorp/random"
      version = "3.5.1"
    }
    ec = {
      source  = "elastic/ec"
      version = "0.9.0"
    }
    elasticstack = {
      source  = "elastic/elasticstack"
      version = "0.5.0"
    }
  }
}

provider "ec" {
  apikey = var.ec_api_key
}

resource "random_uuid" "uuid" {}

#Resource block actually responsible for creating elastic deployment.
resource "ec_deployment" "ecaas_deployment" {

  name  =  “test123”

  region                 = var.region
  version                = var.ec_version.                #ec version used is 8.9.2 
  deployment_template_id = var.deployment_template_id.  #aws storage optimized

  elasticsearch =  {
    autoscale = var.autoscale.  #And for this exp; I've set autoscale to "true" 

    config = {
        user_settings_yaml = templatefile("./ec_user_settings.yaml.tftpl”)
      }

    cold = {
      autoscaling = var.autoscale ? {
        max_size          = "0g"
        max_size_resource = "memory"
      } : {}
    }

    frozen = {
      size       = var.autoscale ? var.frozen_size_def : var.frozen_size
      zone_count = var.frozen_count

      autoscaling = var.autoscale ? {
        max_size          = var.frozen_size
        max_size_resource = "memory"
      } : {}
    }

    hot = {
      size       = var.autoscale ? var.hot_size_def : var.hot_size
      zone_count = var.hot_count

      autoscaling = var.autoscale ? {
        max_size          = var.hot_size
        max_size_resource = "memory"
      } : {}
    }

    ml = {
      autoscaling = var.autoscale ? {
        max_size          = "0g"
        max_size_resource = "memory"
      } : {}
    }

    warm = {
      autoscaling = var.autoscale ? {
        max_size          = "0g"
        max_size_resource = "memory"
      } : {}
    }

  }

  kibana = {
    topology = {
      size       = var.kibana_size
      zone_count = var.kibana_count
    }
    config = {
      user_settings_yaml = file("./kibana_user_settings.yaml")
    }
  }

  integrations_server = {
    topology = {
      size       = var.integrations_server_size
      zone_count = var.integrations_server_count
    }
  }

}

resource "ec_deployment_elasticsearch_keystore" "client_secret" {
  deployment_id = ec_deployment.ecaas_deployment.id
  setting_name  = "xpack.security.authc.realms.oidc.oidc1.rp.client_secret"
  value         = var.ec_keystore_client_secret
}
dimuon commented 11 months ago

@saimantr , you need to move client_secret definition to new attribute keystore_contents of ec_deployment. Please refer to the example for more details.