Unable to connect to secure elastic search in kubernetes. #8433

Closed LilMonk closed 7 months ago

LilMonk commented 1 year ago

Describe the bug I'm trying to set up datahub on Kubernetes. My environment contains Postgres, Kafka (Strimzi), and ElasticSearch (Using the official elastic search operator). I have enabled TLS/SSL on Kafka and ElasticSearch. I'm able to connect to Kafka but not to ElasticSeach.

This is my YAML:

apiVersion: batch/v1
kind: Job
  name: datahub-datahub-system-update-job
  labels: "Helm" "datahub" 0.10.2 "datahub-0.2.164"
    # This is what defines this resource as a hook. Without this line, the
    # job is considered part of the release.
    "": pre-install,pre-upgrade
    "": "-4"
    "": before-hook-creation
        - name: datahub-kafka-certs-dir
            defaultMode: 0444
            secretName: kafka-user-certs
        - name: datahub-elasticsearch-certs-dir
            defaultMode: 0444
            secretName: elasticsearch-user-certs
        - name: cacerts
          emptyDir: {}
        - name: tls
            defaultMode: 256
            secretName: root-secret
      restartPolicy: Never
        fsGroup: 1000
        - name: init-cacerts
          image: "acryldata/datahub-upgrade:v0.10.4"
          - sh
          - -c
          - |
            cp -R /etc/ssl/certs/* /cacerts/
            cp /security/ca.crt /cacerts/ca.crt
          - mountPath: /cacerts
            name: cacerts
          - mountPath: /security
            name: tls
        - name: datahub-system-update-job
          image: "acryldata/datahub-upgrade:v0.10.4"
          imagePullPolicy: IfNotPresent
            - "-u"
            - "SystemUpdate"
            - name: DATAHUB_REVISION
              value: "1"
              value: /datahub/datahub-gms/resources/entity-registry.yml
            - name: DATAHUB_GMS_HOST
              value: datahub-datahub-gms
            - name: DATAHUB_GMS_PORT
              value: "8080"
            - name: DATAHUB_MAE_CONSUMER_HOST
              value: datahub-datahub-mae-consumer
            - name: DATAHUB_MAE_CONSUMER_PORT
              value: "9091"
              value: "postgres"
                  name: ""
                  key: "password"
            - name: EBEAN_DATASOURCE_HOST
              value: "postgres.postgres"
            - name: EBEAN_DATASOURCE_URL
              value: "jdbc:postgresql://postgres.postgres:5432/datahub"
            - name: EBEAN_DATASOURCE_DRIVER
              value: "org.postgresql.Driver"
            - name: KAFKA_BOOTSTRAP_SERVER
              value: "kafka-kafka-ingresstls-bootstrap.kafka:9093"
            - name: KAFKA_SCHEMAREGISTRY_URL
              value: "http://schema-registry.kafka:8081"
            - name: ELASTICSEARCH_HOST
              value: "elasticsearch-es-http.elasticsearch"
            - name: ELASTICSEARCH_PORT
              value: "9200"
            - name: SKIP_ELASTICSEARCH_CHECK
              value: "false"
            - name: ELASTICSEARCH_INSECURE
              value: "true"
            - name: ELASTICSEARCH_USE_SSL
              value: "true"
            - name: ELASTICSEARCH_USERNAME
              value: elastic_user
            - name: ELASTICSEARCH_PASSWORD
              value: "elastic_pass"
              value: "TLSv1.2"
              value: "/mnt/datahub/certs/elasticsearch/truststore.jks"
                  name: elasticsearch-user-certs
                  key: truststore.password
              value: "JKS"
              value: "/mnt/datahub/certs/elasticsearch/keystore.jks"
                  name: elasticsearch-user-certs
                  key: keystore.password
              value: "JKS"
                  name: elasticsearch-user-certs
                  key: keystore.password
            - name: GRAPH_SERVICE_IMPL
              value: elasticsearch
              value: "SSL"
              value: "/mnt/datahub/certs/kafka/truststore.jks"
              value: "SSL"
              value: ""
              value: "/mnt/datahub/certs/kafka/keystore.jks"
              value: "JKS"
              value: "TLS"
              value: "/mnt/datahub/certs/kafka/truststore.jks"
              value: "JKS"
                  name: kafka-user-certs
                  key: keystore.password
                  name: kafka-user-certs
                  key: keystore.password
                  name: kafka-user-certs
                  key: truststore.password
            - name: METADATA_CHANGE_EVENT_NAME
              value: MetadataChangeEvent_v4
              value: FailedMetadataChangeEvent_v4
            - name: METADATA_AUDIT_EVENT_NAME
              value: MetadataAuditEvent_v4
              value: MetadataChangeProposal_v1
              value: FailedMetadataChangeProposal_v1
              value: MetadataChangeLog_Versioned_v1
              value: MetadataChangeLog_Timeseries_v1
              value: DataHubUpgradeHistory_v1
              value: "true"
            - name: SCHEMA_REGISTRY_TYPE
              value: "KAFKA"
              value: "true"
              value: "true"
              value: "true"
            - name: datahub-kafka-certs-dir
              mountPath: /mnt/datahub/certs/kafka
            - name: datahub-elasticsearch-certs-dir
              mountPath: /mnt/datahub/certs/elasticsearch
            - mountPath: /etc/ssl/certs
              name: cacerts
              cpu: 500m
              memory: 512Mi
              cpu: 300m
              memory: 256Mi

The error log:

2023-07-17 13:16:07,710 [main] INFO  io.ebean.EbeanVersion:31 - ebean version: 11.33.3
2023-07-17 13:16:07,811 [main] INFO - loaded properties from [application.yml]
2023-07-17 13:16:08,023 [main] INFO  i.e.datasource.pool.ConnectionPool:294 - DataSourcePool [gmsEbeanServiceConfig] autoCommit[false] transIsolation[READ_COMMITTED] min[2] max[50]
2023-07-17 13:16:10,202 [main] INFO  io.ebean.internal.DefaultContainer:208 - DatabasePlatform name:gmsEbeanServiceConfig platform:postgres
2023-07-17 13:16:11,624 [main] INFO  c.l.g.f.k.s.KafkaSchemaRegistryFactory:61 - creating schema registry config using url: http://schema-registry.kafka:8081
2023-07-17 13:16:15,607 [kafka-producer-network-thread | producer-1] INFO  org.apache.kafka.clients.Metadata:277 - [Producer clientId=producer-1] Cluster ID: K3DZrqfxRhOmUjTYbE1xlg
2023-07-17 13:16:24,920 [main] WARN  c.l.m.m.r.PluginEntityRegistryLoader:44 - /etc/datahub/plugins/models directory does not exist or is not a directory. Plugin scanning will be disabled.
2023-07-17 13:16:25,612 [main] INFO  c.l.m.m.r.MergedEntityRegistry:99 - dataHubPolicyKey schema is compatible with previous schema
2023-07-17 13:16:26,704 [main] WARN  c.l.r.t.h.client.HttpClientFactory:917 - No scheduled executor is provided to HttpClientFactory, using it's own scheduled executor.
2023-07-17 13:16:32,004 [main] INFO  c.l.g.f.s.ElasticSearchServiceFactory:56 - Search configuration: SearchConfiguration(maxTermBucketSize=20, exactMatch=ExactMatchConfiguration(exclusive=false, withPrefix=true, prefixFactor=1.6, exactFactor=10.0, caseSensitivityFactor=0.7, enableStructured=true), partial=PartialConfiguration(urnFactor=0.5, factor=0.4), custom=CustomConfiguration(enabled=false, file=search_config.yml), graph=GraphQueryConfiguration(timeoutSeconds=50, batchSize=1000, maxResult=10000))
2023-07-17 13:16:32,111 [main] INFO - Custom search configuration disabled.
2023-07-17 13:16:32,508 [main] INFO  c.l.g.f.k.s.DUHESchemaRegistryFactory:29 - DataHub System Update Registry
2023-07-17 13:16:37,911 [main] WARN  c.d.p.configuration.ConfigProvider:39 - Configuration config.yml file not found at location /etc/datahub/plugins/auth
2023-07-17 13:16:37,912 [main] INFO  c.l.g.f.auth.AuthorizerChainFactory:75 - Default DataHubAuthorizer is enabled. Appending it to the authorization chain.
2023-07-17 13:16:38,006 [main] INFO  c.l.g.f.k.KafkaEventConsumerFactory:100 - Event-based KafkaListenerContainerFactory built successfully. Consumer concurrency = 1
2023-07-17 13:16:38,013 [main] INFO  c.l.g.f.k.KafkaEventConsumerFactory:116 - Event-based DUHE KafkaListenerContainerFactory built successfully. Consumer concurrency = 1
2023-07-17 13:16:38,016 [main] INFO  c.l.g.f.k.SimpleKafkaConsumerFactory:48 - Simple KafkaListenerContainerFactory built successfully
2023-07-17 13:16:40,521 [main] INFO  c.l.d.u.impl.DefaultUpgradeReport:16 - Starting upgrade with id SystemUpdate...
2023-07-17 13:16:40,523 [main] INFO  c.l.d.u.impl.DefaultUpgradeReport:16 - Executing Step 1/5: BuildIndicesPreStep...
2023-07-17 13:16:41,111 [main] ERROR c.l.d.u.s.e.s.BuildIndicesPreStep:81 - BuildIndicesPreStep failed. PKIX path building failed: unable to find valid certification path to requested target
    at org.elasticsearch.client.RestClient.extractAndWrapCause(
    at org.elasticsearch.client.RestClient.performRequest(
    at org.elasticsearch.client.RestClient.performRequest(
    at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(
    at org.elasticsearch.client.RestHighLevelClient.performRequest(
    at org.elasticsearch.client.IndicesClient.exists(
    at com.linkedin.metadata.graph.elastic.ElasticSearchGraphService.getReindexConfigs(
    at com.linkedin.datahub.upgrade.system.elasticsearch.util.IndexUtils.getAllReindexConfigs(
    at com.linkedin.datahub.upgrade.system.elasticsearch.steps.BuildIndicesPreStep.lambda$executable$0(
    at com.linkedin.datahub.upgrade.impl.DefaultUpgradeManager.executeStepInternal(
    at com.linkedin.datahub.upgrade.impl.DefaultUpgradeManager.executeInternal(
    at com.linkedin.datahub.upgrade.impl.DefaultUpgradeManager.executeInternal(
    at com.linkedin.datahub.upgrade.impl.DefaultUpgradeManager.execute(
    at org.springframework.boot.SpringApplication.callRunner(
    at org.springframework.boot.SpringApplication.callRunners(
    at com.linkedin.datahub.upgrade.UpgradeCliApplication.main(
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(
    at java.base/java.lang.reflect.Method.invoke(
    at org.springframework.boot.loader.Launcher.launch(
    at org.springframework.boot.loader.Launcher.launch(
    at org.springframework.boot.loader.JarLauncher.main(
Caused by: PKIX path building failed: unable to find valid certification path to requested target
    at java.base/
    at java.base/
    at java.base/
    at java.base/
    at java.base/$T13CertificateConsumer.checkServerCerts(
    at java.base/$T13CertificateConsumer.onConsumeCertificate(
    at java.base/$T13CertificateConsumer.consume(
    at java.base/
    at java.base/
    at java.base/$DelegatedTask$
    at java.base/$DelegatedTask$
    at java.base/ Method)
    at java.base/$
    at org.apache.http.nio.reactor.ssl.SSLIOSession.doRunTask(
    at org.apache.http.nio.reactor.ssl.SSLIOSession.doHandshake(
    at org.apache.http.nio.reactor.ssl.SSLIOSession.isAppInputReady(
    at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(
    at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(
    at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(
    at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(
    at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(
    at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(
    at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$
    at java.base/
Caused by: PKIX path building failed: unable to find valid certification path to requested target
    at java.base/
    at java.base/
    at java.base/
    at java.base/
    at java.base/
    at java.base/
    at java.base/$T13CertificateConsumer.checkServerCerts(
    ... 19 common frames omitted
Caused by: unable to find valid certification path to requested target
    at java.base/
    at java.base/
    at java.base/
    at java.base/
    ... 25 common frames omitted
2023-07-17 13:16:41,113 [main] INFO  c.l.d.u.impl.DefaultUpgradeReport:16 - Failed Step 1/5: BuildIndicesPreStep. Failed after 0 retries.
2023-07-17 13:16:41,113 [main] INFO  c.l.d.u.impl.DefaultUpgradeReport:16 - Exiting upgrade SystemUpdate with failure.
2023-07-17 13:16:41,114 [main] INFO  c.l.d.u.impl.DefaultUpgradeReport:16 - Upgrade SystemUpdate completed with result FAILED. Exiting...
2023-07-17 13:16:41,616 [SpringApplicationShutdownHook] INFO  o.a.k.clients.producer.KafkaProducer:1182 - [Producer clientId=producer-1] Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms.
2023-07-17 13:16:41,616 [SpringApplicationShutdownHook] INFO  o.a.k.clients.producer.KafkaProducer:1182 - [Producer clientId=producer-1] Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms.
To Reproduce Steps to reproduce the behavior:

  1. Deploy elastic search using elastic search operator.
    kind: Elasticsearch
    name: elasticsearch
    namespace: elasticsearch
        secretName: es-ca-cert
    version: 7.17.7
      - secretName: elasticsearch-secret
    - name: default
      count: 1
      config: false
        - metadata:
            name: elasticsearch-data # Do not change this name unless you set up a volume mount for the data path.
              - ReadWriteOnce
                storage: 2Gi
            storageClassName: es-storage-class
  2. use this script to generate truststore and keystore.



echo "Current working dir: pwd"

makeCrtsDir () { [ -d foo ] || mkdir -p $ES_CRTS }

clearCrts () { rm -r $ES_CRTS makeCrtsDir }

getCrts () { kubectl get secret elasticsearch-es-http-certs-public \ --namespace=$ES_CLUSTER_NAMESPACE \ --output=go-template='{{index .data "ca.crt" | base64decode }}' \


kubectl get secret elasticsearch-es-http-certs-public \
--output=go-template='{{index .data "tls.crt" | base64decode }}' \
> $ES_CRTS/tls.crt

kubectl get secret elasticsearch-es-http-certs-internal \
--output=go-template='{{index .data "tls.key" | base64decode }}' \
> $ES_CRTS/tls.key


createCrtsStore () { openssl pkcs12 -export \ -in $ES_CRTS/tls.crt \ -inkey $ES_CRTS/tls.key \ -out $ES_CRTS/keystore.p12 \ -name elasticsearch \ -CAfile $ES_CRTS/ca.crt \ -caname elasticsearch \ -password pass:$ES_PASSWORD

keytool -importkeystore \
-deststorepass $ES_PASSWORD \
-destkeypass $ES_PASSWORD \
-destkeystore $ES_CRTS/keystore.jks \
-srckeystore $ES_CRTS/keystore.p12 \
-srcstoretype PKCS12 \
-srcstorepass $ES_PASSWORD \
-alias elasticsearch \

keytool -import \
-trustcacerts \
-alias root \
-file $ES_CRTS/ca.crt \
-keystore $ES_CRTS/truststore.jks \
-storepass $ES_PASSWORD -noprompt


createSecretAtTarget() { kubectl create secret generic elasticsearch-user-certs \ --from-file=$ES_CRTS/keystore.jks \ --from-file=$ES_CRTS/truststore.jks \ --from-literal=keystore.password=$ES_PASSWORD \ --from-literal=truststore.password=$ES_PASSWORD \ --namespace=$TARGET_NAMESPACE }

makeCrtsDir clearCrts getCrts createCrtsStore createSecretAtTarget

**Expected behavior**
The job should have been successfully executed.

**Desktop (please complete the following information):**
 - OS: Pop!_OS 22.04 LTS x86_6

Please let me know if any additional information is required.
godocean commented 1 year ago

I met the same issue as you, not sure how to fix it.

cccadet commented 1 year ago

Same here.

shicholas commented 1 year ago

Same here

Gerrit-K commented 10 months ago

Not stale, same issue here with similar setup.

For the record, because I thought it just didn't pick up my environment variables, I tried these variants:

I could confirm by trial & error that the first two variable variants are indeed picked up, but the client doesn't accept the self-signed certificate from elasticsearch.

wei-jiang-dns53 commented 10 months ago

I run a Nginx HTTP proxy in front of elasticsearch to mitigate this issue

ozmoze commented 10 months ago

I had the same issue using datahub-gms:v0.12.0 along with elasticsearch 8.10.4.

After having tried most of elasticsearch environment variables combinations suggested by @LilMonk and @Gerrit-K, I finally got it to work by setting JAVA_OPTS env variable in datahub-gms section.

  enabled: true
  /// truncated ///
      value: SSL
      value: PKCS12
      value: /elastic-certificates/truststore-elastic.p12
          name: elasticsearch-certs
          key: truststore.password
    - name: JAVA_OPTS

Definitively something that should be addressed indatahub helm chart.

Gerrit-K commented 9 months ago

Thanks @ozmoze, that was a good hint. Although, it didn't work for me right away, since the JAVA_OPTS variable (contrary to my prior belief) isn't a standardized option picked up by the JVM (see this comment). It's just often used in scripts, including the GMS startup script, but unfortunately not the datahub-system-update-job that I was testing the connection with.

However, the variable JDK_JAVA_OPTIONS is a standardized option and is automatically picked up by the JVM. And with this, I was finally able to get it to work with these values:

values.yaml ```yaml .esSslCaCertVolume: &esSslCaCertVolume name: es-ca-certs secret: secretName: your-eck-elasticsearch-es-http-certs-public .esSslCaCertVolumeMount: &esSslCaCertVolumeMount name: es-ca-certs mountPath: /mnt/es-ca-certs .esSslTruststoreVolume: &esSslTruststoreVolume name: es-truststore emptyDir: {} .esSslTruststoreVolumeMount: &esSslTruststoreVolumeMount name: es-truststore mountPath: /mnt/es-truststore .esSslTruststoreFileEnv: &esSslTruststoreFileEnv name: ELASTICSEARCH_SSL_TRUSTSTORE_FILE value: /mnt/es-truststore/ca.p12 .esSslTruststoreTypeEnv: &esSslTruststoreTypeEnv name: ELASTICSEARCH_SSL_TRUSTSTORE_TYPE value: PKCS12 .esSslTruststorePasswordEnv: &esSslTruststorePasswordEnv name: ELASTICSEARCH_SSL_TRUSTSTORE_PASSWORD value: datahub .esSslJdkJavaOptionsEnv: &esSslJdkJavaOptionsEnv name: JDK_JAVA_OPTIONS value: "$(ELASTICSEARCH_SSL_TRUSTSTORE_FILE)$(ELASTICSEARCH_SSL_TRUSTSTORE_TYPE)$(ELASTICSEARCH_SSL_TRUSTSTORE_PASSWORD)" .esSslTruststoreInitContainer: &esSslTruststoreInitContainer name: convert-certs image: openjdk volumeMounts: - *esSslCaCertVolumeMount - *esSslTruststoreVolumeMount env: - *esSslTruststoreFileEnv - *esSslTruststorePasswordEnv command: - sh - -c - 'keytool -importcert -storetype PKCS12 -trustcacerts -noprompt -file /mnt/es-ca-certs/ca.crt -keystore "$ELASTICSEARCH_SSL_TRUSTSTORE_FILE" -storepass "$ELASTICSEARCH_SSL_TRUSTSTORE_PASSWORD"' datahub: global: elasticsearch: host: your-eck-elasticsearch-es-http useSSL: "true" skipcheck: "true" # skips waiting for elasticsearch in "dockerize", as that cannot handle self-signed certs auth: username: &esUser elastic # FIXME: the upstream chart doesn't support reading this from the secret yet password: secretRef: your-eck-elasticsearch-es-elastic-user secretKey: *esUser elasticsearchSetupJob: enabled: true extraVolumes: - *esSslCaCertVolume extraVolumeMounts: - *esSslCaCertVolumeMount extraEnvs: - name: CURL_CA_BUNDLE value: /mnt/es-ca-certs/ca.crt datahub-gms: extraVolumes: - *esSslCaCertVolume - *esSslTruststoreVolume extraVolumeMounts: - *esSslCaCertVolumeMount - *esSslTruststoreVolumeMount extraEnvs: - *esSslTruststoreFileEnv - *esSslTruststoreTypeEnv - *esSslTruststorePasswordEnv - *esSslJdkJavaOptionsEnv extraInitContainers: - *esSslTruststoreInitContainer datahub-mce-consumer: extraVolumes: - *esSslCaCertVolume - *esSslTruststoreVolume extraVolumeMounts: - *esSslCaCertVolumeMount - *esSslTruststoreVolumeMount extraEnvs: - *esSslTruststoreFileEnv - *esSslTruststoreTypeEnv - *esSslTruststorePasswordEnv - *esSslJdkJavaOptionsEnv extraInitContainers: - *esSslTruststoreInitContainer datahub-mae-consumer: extraVolumes: - *esSslCaCertVolume - *esSslTruststoreVolume extraVolumeMounts: - *esSslCaCertVolumeMount - *esSslTruststoreVolumeMount extraEnvs: - *esSslTruststoreFileEnv - *esSslTruststoreTypeEnv - *esSslTruststorePasswordEnv - *esSslJdkJavaOptionsEnv extraInitContainers: - *esSslTruststoreInitContainer datahub-frontend: extraVolumes: - *esSslCaCertVolume - *esSslTruststoreVolume extraVolumeMounts: - *esSslCaCertVolumeMount - *esSslTruststoreVolumeMount extraEnvs: - *esSslTruststoreFileEnv - *esSslTruststoreTypeEnv - *esSslTruststorePasswordEnv - *esSslJdkJavaOptionsEnv extraInitContainers: - *esSslTruststoreInitContainer datahubSystemUpdate: extraVolumes: - *esSslCaCertVolume - *esSslTruststoreVolume extraVolumeMounts: - *esSslTruststoreVolumeMount extraEnvs: - *esSslTruststoreFileEnv - *esSslTruststoreTypeEnv - *esSslTruststorePasswordEnv - *esSslJdkJavaOptionsEnv extraInitContainers: - *esSslTruststoreInitContainer ```

Needless to say, this is waaay too much logic for a Helm "values" file and should be integrated in the Helm chart.

LilMonk commented 9 months ago

I have ended up using Istio service mesh with mtls. So even though all the components are using HTTP but using the envoy proxy the communication between the services is done on HTTPS.

fcecagno commented 1 month ago

Just used instructions on to be able to use Elasticsearch, it should definitely be part of the chart to avoid so many configurations just to allow using a custom CA.