elastic / terraform-provider-ec

https://registry.terraform.io/providers/elastic/ec/latest/docs
Apache License 2.0
174 stars 87 forks source link

[Bug] New cluster creation fails when using OIDC settings #455

Closed arunsudhakar closed 2 years ago

arunsudhakar commented 2 years ago

When trying to setup a new cluster with OIDC connectivity enabled, the cluster fails to deploy successfully and remains in an unhealthy state. If the cluster is setup with OIDC disabled (but the OIDC secret must be set using ec_deployment_elasticsearch_keystore) at first, wait for it to be online and then the OIDC is enabled and re-applied it works. Upon analysis of the logs it seems to be because there is a dependency for the OIDC settings in the user_settngs_yaml with the keystore. Once the keystore with the secret is set, then only should the user_settings_yaml have the OIDC configuration. This dependency might need to be handled within the provider

To Reproduce Steps to reproduce the behavior:

  1. TF configuration used
resource "ec_deployment" "cluster" {
  region                 = var.region
  name                   = var.deployment_name
  version                = var.es_version
  deployment_template_id = var.deployment_template_id

  elasticsearch {
    // topology {
    //   id = "hot_content"
    //   zone_count = 1
    // }
    config {
      user_settings_yaml =  replace("xpack.security.authc.realms.oidc.atwin:\n  order: 2\n  rp.client_id: \"atwin-client\"\n  rp.response_type: code\n  rp.redirect_uri: \"https://<cluster>.kb.southeastasia.azure.elastic-cloud.com:9243/api/security/oidc/callback\"\n  op.issuer: \"<kc_url>/auth/realms/atwin\"\n  op.authorization_endpoint: \"<kc_url>/auth/realms/atwin/protocol/openid-connect/auth\"\n  op.token_endpoint: \"<kc_url>/auth/realms/atwin/protocol/openid-connect/token\"\n  op.jwkset_path: \"<kc_url>/auth/realms/atwin/protocol/openid-connect/certs\"\n  op.userinfo_endpoint: \"<kc_url>/auth/realms/atwin/protocol/openid-connect/userinfo\"\n  op.endsession_endpoint: \"<kc_url>/auth/realms/atwin/protocol/openid-connect/logout\"\n  rp.post_logout_redirect_uri: \"https://<cluster>.kb.southeastasia.azure.elastic-cloud.com:9243/security/logged_out\"\n  claims.principal: sub\n  ","<cluster>",var.deployment_name),"<kc_url>",var.keycloak_url)
    }
  }

  kibana {
    // topology {
    //   zone_count = 1
    // }
    config {
      user_settings_yaml = "xpack.security.authc.providers:\n  basic.basic1:\n    order: 0\n    icon: \"logoElasticsearch\"\n    hint: \"Typically for administrators\"\n  oidc.atwin:\n    order: 1\n    icon: \"https://design.jboss.org/keycloak/logo/images/keycloak_icon_64px.png\"\n    realm: atwin\n    description: \"Log in with Keycloak\""
    }
  }
}

resource "elasticstack_elasticsearch_security_role" "oidc_user_role" {
  name    = "oidc_user_role"
  cluster = ["all"]

  indices {
    names      = ["*"]
    privileges = ["all"]
  }

  applications {
    application = "elasticsearch"
    privileges  = ["admin", "read"]
    resources   = ["*"]
  }

  metadata = jsonencode({
    version = 1
  })

  elasticsearch_connection {
    endpoints = ["${ec_deployment.cluster.elasticsearch[0].https_endpoint}"]
    username  = ec_deployment.cluster.elasticsearch_username
    password  = ec_deployment.cluster.elasticsearch_password
  }  
}

resource "ec_deployment_elasticsearch_keystore" "oidc_secret" {
  deployment_id = ec_deployment.cluster.id
  setting_name  = "xpack.security.authc.realms.oidc.atwin.rp.client_secret"
  value         = var.oidc_secret
}
  1. Terraform Apply
  2. See the error in the output
[instance-0000000000] Exception java.lang.IllegalStateException: security initialization failed at 
org.elasticsearch.xpack.security.Security.createComponents(Security.java:571) ~[?:?] at 
org.elasticsearch.node.Node.lambda$new$18(Node.java:736) ~[elasticsearch-7.17.1.jar:7.17.1] at 
java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:273) ~[?:?] at 
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625) ~[?:?] at 
java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) ~[?:?] at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) ~[?:?] at 
java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921) ~[?:?] at 
java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:?] at 
java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682) ~[?:?] at 
org.elasticsearch.node.Node.<init>(Node.java:750) ~[elasticsearch-7.17.1.jar:7.17.1] at 
org.elasticsearch.node.Node.<init>(Node.java:309) ~[elasticsearch-7.17.1.jar:7.17.1] at 
org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:234) ~[elasticsearch-7.17.1.jar:7.17.1] at 
org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:234) ~[elasticsearch-7.17.1.jar:7.17.1] at 
org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:434) [elasticsearch-7.17.1.jar:7.17.1] at 
org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:166) [elasticsearch-7.17.1.jar:7.17.1] at 
org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:157) [elasticsearch-7.17.1.jar:7.17.1] at 
org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:77) [elasticsearch-7.17.1.jar:7.17.1] at 
org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:112) [elasticsearch-cli-7.17.1.jar:7.17.1] at 
org.elasticsearch.cli.Command.main(Command.java:77) [elasticsearch-cli-7.17.1.jar:7.17.1] at 
org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:122) [elasticsearch-7.17.1.jar:7.17.1] at 
org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:80) [elasticsearch-7.17.1.jar:7.17.1] 

Caused by: org.elasticsearch.common.settings.SettingsException: The configuration setting [xpack.security.authc.realms.oidc.atwin.rp.client_secret] is required at 
org.elasticsearch.xpack.security.authc.oidc.OpenIdConnectRealm.buildRelyingPartyConfiguration(OpenIdConnectRealm.java:263) ~[?:?] at 

org.elasticsearch.xpack.security.authc.oidc.OpenIdConnectRealm.<init>(OpenIdConnectRealm.java:104) ~[?:?] at 
org.elasticsearch.xpack.security.authc.InternalRealms.lambda$getFactories$7(InternalRealms.java:155) ~[?:?] at 
org.elasticsearch.xpack.security.authc.Realms.initRealms(Realms.java:319) ~[?:?] at 
org.elasticsearch.xpack.security.authc.Realms.<init>(Realms.java:91) ~[?:?] at 
org.elasticsearch.xpack.security.Security.createComponents(Security.java:675) ~[?:?] at 
org.elasticsearch.xpack.security.Security.createComponents(Security.java:560) ~[?:?] ... 20 more

Expected behavior The cluster should be setup with OIDC connectivity enabled

Debug output Run terraform command with TF_LOG=trace and provide extended information on TF operations.


2022-03-11T08:07:38.444+0800 [ERROR] vertex "ec_deployment.cluster" error: failed tracking create progress: 3 errors occurred:
    * found deployment plan errors: deployment [412c94258e0c0ce4afc22c0970cc81e6] - [elasticsearch][79b0984b27564d74a197d8b651129545]: caught error: "Plan change failed: Some instances were not running"
    * found deployment plan errors: deployment [412c94258e0c0ce4afc22c0970cc81e6] - [kibana][c3114b345b4940c7b0a4a4b753f84547]: caught error: "Plan change failed: Cluster not reachable: [elasticsearch]... Please validate the cluster is in a healthy state and retry."
    * set "request_id" to "co81gk4duczl6guvtblmxx4lotn16c7y9noudo61pfgb5c7c3h2mxvgjz79x3x43" to recreate the deployment resources
╷
│ Error: failed tracking create progress: 3 errors occurred:
│   * found deployment plan errors: deployment [412c94258e0c0ce4afc22c0970cc81e6] - [elasticsearch][79b0984b27564d74a197d8b651129545]: caught error: "Plan change failed: Some instances were not running"
│   * found deployment plan errors: deployment [412c94258e0c0ce4afc22c0970cc81e6] - [kibana][c3114b345b4940c7b0a4a4b753f84547]: caught error: "Plan change failed: Cluster not reachable: [elasticsearch]... Please validate the cluster is in a healthy state and retry."
│   * set "request_id" to "co81gk4duczl6guvtblmxx4lotn16c7y9noudo61pfgb5c7c3h2mxvgjz79x3x43" to recreate the deployment resources
│ 
│ 
│ 
│   with ec_deployment.cluster,
│   on elastic-cloud.tf line 1, in resource "ec_deployment" "cluster":
│    1: resource "ec_deployment" "cluster" {
│ 

Versions (please complete the following information):

olksdr commented 2 years ago

This error is coming from terraform-provider-ec. I will transfer this issue to that repo.

arunsudhakar commented 2 years ago

Duplicate of #433 . Marking as closed