airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
15.78k stars 4.04k forks source link

airbyte-temporal failed to connect to RDS when rds.force_ssl enabled in RDS #39636

Closed hongbo-miao closed 2 months ago

hongbo-miao commented 3 months ago

Helm Chart Version

0.199.0

What step the error happened?

On deploy

Relevant information

Originally posted at Stack Overflow, here is a copy:


I am trying to deploy Airbyte in Kubernetes (Amazon EKS) with external Postgres (Amazon RDS).

I am using

Group 1

Experiment 1-1 (Succeeded, but with rds.force_ssl disabled)

When I disabled rds.force_ssl in Amazon RDS parameter group with rds.force_ssl: 0, Airbyte can be deployed successfully. It is worth mentioning the pod "airbyte-temporal" can talk to RDS successfully.

Here is my Airbyte Helm my-values.yaml:

global:
  state:
    storage:
      type: S3
  database:
    type: external
    host: production-hm-airbyte-postgres.xxx.us-west-2.rds.amazonaws.com
    port: 5432
    database: airbyte_db
    user: airbyte_user
    secretName: production-hm-airbyte-secret
    passwordSecretKey: POSTGRES_PASSWORD
  logs:
    accessKey:
      existingSecret: production-hm-airbyte-secret
      existingSecretKey: AWS_ACCESS_KEY_ID
    secretKey:
      existingSecret: production-hm-airbyte-secret
      existingSecretKey: AWS_SECRET_ACCESS_KEY
  storage:
    type: S3
    bucket:
      activityPayload: production-hm-airbyte
      log: production-hm-airbyte
      state: production-hm-airbyte
      workloadOutput: production-hm-airbyte
  minio:
    enabled: false
server:
  extraEnv:
    - name: AWS_ACCESS_KEY_ID
      valueFrom:
        secretKeyRef:
          name: production-hm-airbyte-secret
          key: AWS_ACCESS_KEY_ID
    - name: AWS_SECRET_ACCESS_KEY
      valueFrom:
        secretKeyRef:
          name: production-hm-airbyte-secret
          key: AWS_SECRET_ACCESS_KEY
    - name: STATE_STORAGE_S3_ACCESS_KEY
      valueFrom:
        secretKeyRef:
          name: production-hm-airbyte-secret
          key: AWS_ACCESS_KEY_ID
    - name: STATE_STORAGE_S3_SECRET_ACCESS_KEY
      valueFrom:
        secretKeyRef:
          name: production-hm-airbyte-secret
          key: AWS_SECRET_ACCESS_KEY
    - name: STATE_STORAGE_S3_BUCKET_NAME
      valueFrom:
        secretKeyRef:
          name: production-hm-airbyte-secret
          key: LOG_S3_BUCKET_NAME
    - name: STATE_STORAGE_S3_REGION
      valueFrom:
        secretKeyRef:
          name: production-hm-airbyte-secret
          key: LOG_S3_BUCKET_REGION
worker:
  extraEnv:
    - name: AWS_ACCESS_KEY_ID
      valueFrom:
        secretKeyRef:
          name: production-hm-airbyte-secret
          key: AWS_ACCESS_KEY_ID
    - name: AWS_SECRET_ACCESS_KEY
      valueFrom:
        secretKeyRef:
          name: production-hm-airbyte-secret
          key: AWS_SECRET_ACCESS_KEY
    - name: STATE_STORAGE_S3_ACCESS_KEY
      valueFrom:
        secretKeyRef:
          name: production-hm-airbyte-secret
          key: AWS_ACCESS_KEY_ID
    - name: STATE_STORAGE_S3_SECRET_ACCESS_KEY
      valueFrom:
        secretKeyRef:
          name: production-hm-airbyte-secret
          key: AWS_SECRET_ACCESS_KEY
    - name: STATE_STORAGE_S3_BUCKET_NAME
      valueFrom:
        secretKeyRef:
          name: production-hm-airbyte-secret
          key: LOG_S3_BUCKET_NAME
    - name: STATE_STORAGE_S3_REGION
      valueFrom:
        secretKeyRef:
          name: production-hm-airbyte-secret
          key: LOG_S3_BUCKET_REGION
    - name: AWS_DEFAULT_REGION
      valueFrom:
        secretKeyRef:
          name: production-hm-airbyte-secret
          key: LOG_S3_BUCKET_REGION
postgresql:
  enabled: false
externalDatabase:
  host: production-hm-airbyte-postgres.xxx.us-west-2.rds.amazonaws.com
  port: 5432
  database: airbyte_db
  user: airbyte_user
  existingSecret: production-hm-airbyte-secret
  existingSecretPasswordKey: POSTGRES_PASSWORD
  jdbcUrl: jdbc:postgresql://production-hm-airbyte-postgres.xxx.us-west-2.rds.amazonaws.com:5432/airbyte_db?ssl=true&sslmode=require
temporal:
  extraEnv:
    # https://github.com/temporalio/docker-builds/blob/main/docker/auto-setup.sh
    # Boolean below needs to be in string format
    - name: SKIP_DB_CREATE
      value: "true"
    - name: DBNAME
      value: temporal_db
    - name: VISIBILITY_DBNAME
      value: temporal_visibility_db

Here is the pod "airbyte-temporal" successful log:

https://gist.github.com/hongbo-miao/eb5dcc71ad60aa38d285a5ed816128ed

Experiment 1-2 (Failed with rds.force_ssl enabled)

I do want to enable rds.force_ssl. When I use rds.force_ssl: 1 with same my-values.yaml from experiment 1, pod "airbyte-temporal" failed deploying with error:

TEMPORAL_ADDRESS is not set, setting it to 172.31.45.243:7233
PostgreSQL started.
Setup PostgreSQL schema.
2024-06-18T21:21:01.292Z  ERROR Unable to connect to SQL database.  {"error": "pq: no pg_hba.conf entry for host \"172.31.45.243\", user \"airbyte_user\", database \"temporal\", no encryption", "logging-call-at": "handler.go:52"}
2024/06/18 21:21:01 Loading config; env=docker,zone=,configDir=config
2024/06/18 21:21:01 Loading config files=[config/docker.yaml]
{"level":"info","ts":"2024-06-18T21:21:01.416Z","msg":"Build info.","git-time":"2024-03-22T16:43:28.000Z","git-revision":"92489dd75f17a2daa0a537278c8b6337f71fd704","git-modified":true,"go-arch":"amd64","go-os":"linux","go-version":"go1.22.1","cgo-enabled":false,"server-version":"1.23.0-rc16","debug-mode":false,"logging-call-at":"main.go:148"}
{"level":"info","ts":"2024-06-18T21:21:01.416Z","msg":"dynamic config changed for the key: frontend.enableclientversioncheck oldValue: nil newValue: { constraints: {} value: true }","logging-call-at":"file_based_client.go:275"}
{"level":"info","ts":"2024-06-18T21:21:01.416Z","msg":"dynamic config changed for the key: history.historymgrnumconns oldValue: nil newValue: { constraints: {} value: 50 }","logging-call-at":"file_based_client.go:275"}
{"level":"info","ts":"2024-06-18T21:21:01.416Z","msg":"dynamic config changed for the key: system.advancedvisibilitywritingmode oldValue: nil newValue: { constraints: {} value: off }","logging-call-at":"file_based_client.go:275"}
{"level":"info","ts":"2024-06-18T21:21:01.417Z","msg":"dynamic config changed for the key: history.defaultactivityretrypolicy oldValue: nil newValue: { constraints: {} value: map[BackoffCoefficient:2 InitialIntervalInSeconds:1 MaximumAttempts:0 MaximumIntervalCoefficient:100] }","logging-call-at":"file_based_client.go:275"}
{"level":"info","ts":"2024-06-18T21:21:01.417Z","msg":"dynamic config changed for the key: limit.blobsize.warn oldValue: nil newValue: { constraints: {} value: 10485760 }","logging-call-at":"file_based_client.go:275"}
{"level":"info","ts":"2024-06-18T21:21:01.417Z","msg":"dynamic config changed for the key: frontend.historymgrnumconns oldValue: nil newValue: { constraints: {} value: 30 }","logging-call-at":"file_based_client.go:275"}
{"level":"info","ts":"2024-06-18T21:21:01.417Z","msg":"dynamic config changed for the key: history.defaultworkflowretrypolicy oldValue: nil newValue: { constraints: {} value: map[BackoffCoefficient:2 InitialIntervalInSeconds:1 MaximumAttempts:0 MaximumIntervalCoefficient:100] }","logging-call-at":"file_based_client.go:275"}
{"level":"info","ts":"2024-06-18T21:21:01.417Z","msg":"dynamic config changed for the key: frontend.persistencemaxqps oldValue: nil newValue: { constraints: {} value: 3000 }","logging-call-at":"file_based_client.go:275"}
{"level":"info","ts":"2024-06-18T21:21:01.428Z","msg":"dynamic config changed for the key: frontend.throttledlogrps oldValue: nil newValue: { constraints: {} value: 20 }","logging-call-at":"file_based_client.go:275"}
{"level":"info","ts":"2024-06-18T21:21:01.428Z","msg":"dynamic config changed for the key: history.persistencemaxqps oldValue: nil newValue: { constraints: {} value: 3000 }","logging-call-at":"file_based_client.go:275"}
{"level":"info","ts":"2024-06-18T21:21:01.428Z","msg":"dynamic config changed for the key: limit.blobsize.error oldValue: nil newValue: { constraints: {} value: 15728640 }","logging-call-at":"file_based_client.go:275"}
{"level":"info","ts":"2024-06-18T21:21:01.429Z","msg":"Updated dynamic config","logging-call-at":"file_based_client.go:195"}
{"level":"warn","ts":"2024-06-18T21:21:01.429Z","msg":"Not using any authorizer and flag `--allow-no-auth` not detected. Future versions will require using the flag `--allow-no-auth` if you do not want to set an authorizer.","logging-call-at":"main.go:178"}

Experiment 1-3 (Partially failed with rds.force_ssl enabled when pass CA pem file)

I downloaded Amazon RDS's global-bundle.pem from https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/UsingWithRDS.SSL.html#UsingWithRDS.SSL.CertificatesAllRegions

And deployed a file

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: production-hm-airbyte-config-map
  namespace: production-hm-airbyte
data:
  # https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/UsingWithRDS.SSL.html#UsingWithRDS.SSL.CertificatesAllRegions
  amazon-rds-ca-global-bundle.pem: |
    -----BEGIN CERTIFICATE-----
    ...
    -----END CERTIFICATE-----
    ...
    ...
    -----BEGIN CERTIFICATE-----
    ...
    -----END CERTIFICATE-----

Based on auto-setup source code, I found POSTGRES_TLS_ENABLED, POSTGRES_TLS_DISABLE_HOST_VERIFICATION, POSTGRES_TLS_CA_FILE. (It also has POSTGRES_TLS_CERT_FILE and POSTGRES_TLS_KEY_FILE inside)

I updated Airbyte Helm my-values.yaml temporal section to

# ...
temporal:
  extraVolumes:
    - name: airbyte-config-map-volume
      configMap:
        name: production-hm-airbyte-config-map
  extraVolumeMounts:
    - name: airbyte-config-map-volume
      subPath: amazon-rds-ca-global-bundle.pem
      mountPath: /etc/ssl/certs/amazon-rds-ca-global-bundle.pem
  extraEnv:
    # https://github.com/temporalio/docker-builds/blob/main/docker/auto-setup.sh
    # Boolean below needs to be in string format
    - name: SKIP_DB_CREATE
      value: "true"
    - name: DBNAME
      value: temporal_db
    - name: VISIBILITY_DBNAME
      value: temporal_visibility_db
    - name: POSTGRES_TLS_ENABLED
      value: "true"
    - name: POSTGRES_TLS_DISABLE_HOST_VERIFICATION
      value: "false"
    - name: POSTGRES_TLS_CA_FILE
      value: /etc/ssl/certs/amazon-rds-ca-global-bundle.pem

I can confirm pod airbyte-temporal picked up file amazon-rds-ca-global-bundle.pem correctly. As if the path is wrong, it will throw an path related error saying cannot find.

Now pod "airbyte-temporal" log is different, it seems it failed at a later "sql schema version compatibility check" step.

TEMPORAL_ADDRESS is not set, setting it to 172.31.37.167:7233
PostgreSQL started.
Setup PostgreSQL schema.
2024-06-19T21:08:41.032Z  [34mINFO[0m Starting schema setup {"config": {"SchemaFilePath":"","SchemaName":"","InitialVersion":"0.0","Overwrite":false,"DisableVersioning":false}, "logging-call-at": "setuptask.go:63"}
2024-06-19T21:08:41.032Z  [35mDEBUG[0m  Setting up version tables {"logging-call-at": "setuptask.go:73"}
2024-06-19T21:08:41.078Z  [35mDEBUG[0m  Current database schema version 1.11 is greater than initial schema version 0.0. Skip version upgrade {"logging-call-at": "setuptask.go:134"}
2024-06-19T21:08:41.079Z  [34mINFO[0m Schema setup complete {"logging-call-at": "setuptask.go:149"}
2024-06-19T21:08:41.223Z  [34mINFO[0m UpdateSchemaTask started  {"config": {"DBName":"","TargetVersion":"","SchemaDir":"/etc/temporal/schema/postgresql/v96/temporal/versioned","SchemaName":"","IsDryRun":false}, "logging-call-at": "updatetask.go:102"}
2024-06-19T21:08:41.228Z  [35mDEBUG[0m  Schema Dirs: [] {"logging-call-at": "updatetask.go:210"}
2024-06-19T21:08:41.229Z  [35mDEBUG[0m  found zero updates from current version 1.11  {"logging-call-at": "updatetask.go:132"}
2024-06-19T21:08:41.229Z  [34mINFO[0m UpdateSchemaTask done {"logging-call-at": "updatetask.go:125"}
2024-06-19T21:08:41.407Z  [34mINFO[0m Starting schema setup {"config": {"SchemaFilePath":"","SchemaName":"","InitialVersion":"0.0","Overwrite":false,"DisableVersioning":false}, "logging-call-at": "setuptask.go:63"}
2024-06-19T21:08:41.407Z  [35mDEBUG[0m  Setting up version tables {"logging-call-at": "setuptask.go:73"}
2024-06-19T21:08:41.435Z  [35mDEBUG[0m  Current database schema version 1.1 is greater than initial schema version 0.0. Skip version upgrade  {"logging-call-at": "setuptask.go:134"}
2024-06-19T21:08:41.435Z  [34mINFO[0m Schema setup complete {"logging-call-at": "setuptask.go:149"}
2024-06-19T21:08:41.633Z  [34mINFO[0m UpdateSchemaTask started  {"config": {"DBName":"","TargetVersion":"","SchemaDir":"/etc/temporal/schema/postgresql/v96/visibility/versioned","SchemaName":"","IsDryRun":false}, "logging-call-at": "updatetask.go:102"}
2024-06-19T21:08:41.638Z  [35mDEBUG[0m  Schema Dirs: [] {"logging-call-at": "updatetask.go:210"}
2024-06-19T21:08:41.638Z  [35mDEBUG[0m  found zero updates from current version 1.1 {"logging-call-at": "updatetask.go:132"}
2024-06-19T21:08:41.638Z  [34mINFO[0m UpdateSchemaTask done {"logging-call-at": "updatetask.go:125"}
Temporal CLI address: 172.31.37.167:7233.
2024/06/19 21:08:41 Loading config; env=docker,zone=,configDir=config
2024/06/19 21:08:41 Loading config files=[config/docker.yaml]
{"level":"info","ts":"2024-06-19T21:08:41.888Z","msg":"Build info.","git-time":"2024-03-22T16:43:28.000Z","git-revision":"92489dd75f17a2daa0a537278c8b6337f71fd704","git-modified":true,"go-arch":"amd64","go-os":"linux","go-version":"go1.22.1","cgo-enabled":false,"server-version":"1.23.0-rc16","debug-mode":false,"logging-call-at":"main.go:148"}
{"level":"info","ts":"2024-06-19T21:08:41.889Z","msg":"dynamic config changed for the key: limit.blobsize.warn oldValue: nil newValue: { constraints: {} value: 10485760 }","logging-call-at":"file_based_client.go:275"}
{"level":"info","ts":"2024-06-19T21:08:41.889Z","msg":"dynamic config changed for the key: frontend.throttledlogrps oldValue: nil newValue: { constraints: {} value: 20 }","logging-call-at":"file_based_client.go:275"}
{"level":"info","ts":"2024-06-19T21:08:41.889Z","msg":"dynamic config changed for the key: history.historymgrnumconns oldValue: nil newValue: { constraints: {} value: 50 }","logging-call-at":"file_based_client.go:275"}
{"level":"info","ts":"2024-06-19T21:08:41.889Z","msg":"dynamic config changed for the key: system.advancedvisibilitywritingmode oldValue: nil newValue: { constraints: {} value: off }","logging-call-at":"file_based_client.go:275"}
{"level":"info","ts":"2024-06-19T21:08:41.889Z","msg":"dynamic config changed for the key: history.defaultactivityretrypolicy oldValue: nil newValue: { constraints: {} value: map[BackoffCoefficient:2 InitialIntervalInSeconds:1 MaximumAttempts:0 MaximumIntervalCoefficient:100] }","logging-call-at":"file_based_client.go:275"}
{"level":"info","ts":"2024-06-19T21:08:41.890Z","msg":"dynamic config changed for the key: history.persistencemaxqps oldValue: nil newValue: { constraints: {} value: 3000 }","logging-call-at":"file_based_client.go:275"}
{"level":"info","ts":"2024-06-19T21:08:41.890Z","msg":"dynamic config changed for the key: frontend.persistencemaxqps oldValue: nil newValue: { constraints: {} value: 3000 }","logging-call-at":"file_based_client.go:275"}
{"level":"info","ts":"2024-06-19T21:08:41.890Z","msg":"dynamic config changed for the key: history.defaultworkflowretrypolicy oldValue: nil newValue: { constraints: {} value: map[BackoffCoefficient:2 InitialIntervalInSeconds:1 MaximumAttempts:0 MaximumIntervalCoefficient:100] }","logging-call-at":"file_based_client.go:275"}
{"level":"info","ts":"2024-06-19T21:08:41.890Z","msg":"dynamic config changed for the key: limit.blobsize.error oldValue: nil newValue: { constraints: {} value: 15728640 }","logging-call-at":"file_based_client.go:275"}
{"level":"info","ts":"2024-06-19T21:08:41.890Z","msg":"dynamic config changed for the key: frontend.enableclientversioncheck oldValue: nil newValue: { constraints: {} value: true }","logging-call-at":"file_based_client.go:275"}
{"level":"info","ts":"2024-06-19T21:08:41.890Z","msg":"dynamic config changed for the key: frontend.historymgrnumconns oldValue: nil newValue: { constraints: {} value: 30 }","logging-call-at":"file_based_client.go:275"}
{"level":"info","ts":"2024-06-19T21:08:41.890Z","msg":"Updated dynamic config","logging-call-at":"file_based_client.go:195"}
{"level":"warn","ts":"2024-06-19T21:08:41.891Z","msg":"Not using any authorizer and flag `--allow-no-auth` not detected. Future versions will require using the flag `--allow-no-auth` if you do not want to set an authorizer.","logging-call-at":"main.go:178"}
[Fx] PROVIDE  *temporal.ServerImpl <= go.temporal.io/server/temporal.NewServerFxImpl()
[Fx] PROVIDE  *temporal.serverOptions <= go.temporal.io/server/temporal.ServerOptionsProvider()
[Fx] PROVIDE  chan interface {} <= go.temporal.io/server/temporal.ServerOptionsProvider()
[Fx] PROVIDE  temporal.synchronizationModeParams <= go.temporal.io/server/temporal.ServerOptionsProvider()
[Fx] PROVIDE  *config.Config <= go.temporal.io/server/temporal.ServerOptionsProvider()
[Fx] PROVIDE  *config.PProf <= go.temporal.io/server/temporal.ServerOptionsProvider()
[Fx] PROVIDE  log.Config <= go.temporal.io/server/temporal.ServerOptionsProvider()
[Fx] PROVIDE  resource.ServiceNames <= go.temporal.io/server/temporal.ServerOptionsProvider()
[Fx] PROVIDE  resource.NamespaceLogger <= go.temporal.io/server/temporal.ServerOptionsProvider()
[Fx] PROVIDE  resolver.ServiceResolver <= go.temporal.io/server/temporal.ServerOptionsProvider()
[Fx] PROVIDE  client.AbstractDataStoreFactory <= go.temporal.io/server/temporal.ServerOptionsProvider()
[Fx] PROVIDE  visibility.VisibilityStoreFactory <= go.temporal.io/server/temporal.ServerOptionsProvider()
[Fx] PROVIDE  searchattribute.Mapper <= go.temporal.io/server/temporal.ServerOptionsProvider()
[Fx] PROVIDE  []grpc.UnaryServerInterceptor <= go.temporal.io/server/temporal.ServerOptionsProvider()
[Fx] PROVIDE  authorization.Authorizer <= go.temporal.io/server/temporal.ServerOptionsProvider()
[Fx] PROVIDE  authorization.ClaimMapper <= go.temporal.io/server/temporal.ServerOptionsProvider()
[Fx] PROVIDE  authorization.JWTAudienceMapper <= go.temporal.io/server/temporal.ServerOptionsProvider()
[Fx] PROVIDE  log.Logger <= go.temporal.io/server/temporal.ServerOptionsProvider()
[Fx] PROVIDE  client.FactoryProvider <= go.temporal.io/server/temporal.ServerOptionsProvider()
[Fx] PROVIDE  dynamicconfig.Client <= go.temporal.io/server/temporal.ServerOptionsProvider()
[Fx] PROVIDE  encryption.TLSConfigProvider <= go.temporal.io/server/temporal.ServerOptionsProvider()
[Fx] PROVIDE  *client.Config <= go.temporal.io/server/temporal.ServerOptionsProvider()
[Fx] PROVIDE  client.Client <= go.temporal.io/server/temporal.ServerOptionsProvider()
[Fx] PROVIDE  metrics.Handler <= go.temporal.io/server/temporal.ServerOptionsProvider()
[Fx] PROVIDE  *dynamicconfig.Collection <= go.temporal.io/server/common/dynamicconfig.NewCollection()
[Fx] PROVIDE  archiver.ArchivalMetadata <= go.temporal.io/server/common/resource.ArchivalMetadataProvider()
[Fx] PROVIDE  tasks.TaskCategoryRegistry <= go.temporal.io/server/temporal.TaskCategoryRegistryProvider()
[Fx] PROVIDE  client.FactoryProviderFn <= go.temporal.io/server/temporal.PersistenceFactoryProvider()
[Fx] PROVIDE  *temporal.ServicesMetadata[group = "services"] <= go.temporal.io/server/temporal.HistoryServiceProvider()
[Fx] PROVIDE  *temporal.ServicesMetadata[group = "services"] <= go.temporal.io/server/temporal.MatchingServiceProvider()
[Fx] PROVIDE  *temporal.ServicesMetadata[group = "services"] <= go.temporal.io/server/temporal.FrontendServiceProvider()
[Fx] PROVIDE  *temporal.ServicesMetadata[group = "services"] <= go.temporal.io/server/temporal.InternalFrontendServiceProvider()
[Fx] PROVIDE  *temporal.ServicesMetadata[group = "services"] <= go.temporal.io/server/temporal.WorkerServiceProvider()
[Fx] PROVIDE  *cluster.Config <= go.temporal.io/server/temporal.ApplyClusterMetadataConfigProvider()
[Fx] PROVIDE  config.Persistence <= go.temporal.io/server/temporal.ApplyClusterMetadataConfigProvider()
[Fx] PROVIDE  *pprof.PProfInitializerImpl <= go.temporal.io/server/common/pprof.NewInitializer()
[Fx] PROVIDE  []trace.SpanExporter <= go.temporal.io/server/temporal.init.func2()
[Fx] SUPPLY []temporal.ServerOption
[Fx] PROVIDE  fx.Lifecycle <= go.uber.org/fx.New.func1()
[Fx] PROVIDE  fx.Shutdowner <= go.uber.org/fx.(*App).shutdowner-fm()
[Fx] PROVIDE  fx.DotGraph <= go.uber.org/fx.(*App).dotGraph-fm()
[Fx] RUN  supply: stub([]temporal.ServerOption)
[Fx] RUN  provide: go.temporal.io/server/temporal.ServerOptionsProvider()
[Fx] Error returned: received non-nil error from function "go.temporal.io/server/temporal".ServerOptionsProvider
  /home/builder/temporal/temporal/fx.go:180:
sql schema version compatibility check failed: pq: no pg_hba.conf entry for host "172.31.37.167", user "airbyte_user", database "temporal_db", no encryption
[Fx] ERROR    Failed to initialize custom logger: could not build arguments for function "go.uber.org/fx".(*module).constructCustomLogger.func2
  /go/pkg/mod/go.uber.org/fx@v1.20.0/module.go:251:
failed to build fxevent.Logger:
could not build arguments for function "go.temporal.io/server/temporal".init.func8
  /home/builder/temporal/temporal/fx.go:1029:
failed to build log.Logger:
received non-nil error from function "go.temporal.io/server/temporal".ServerOptionsProvider
  /home/builder/temporal/temporal/fx.go:180:
sql schema version compatibility check failed: pq: no pg_hba.conf entry for host "172.31.37.167", user "airbyte_user", database "temporal_db", no encryption
Unable to create server. Error: could not build arguments for function "go.uber.org/fx".(*module).constructCustomLogger.func2 (/go/pkg/mod/go.uber.org/fx@v1.20.0/module.go:251): failed to build fxevent.Logger: could not build arguments for function "go.temporal.io/server/temporal".init.func8 (/home/builder/temporal/temporal/fx.go:1029): failed to build log.Logger: received non-nil error from function "go.temporal.io/server/temporal".ServerOptionsProvider (/home/builder/temporal/temporal/fx.go:180): sql schema version compatibility check failed: pq: no pg_hba.conf entry for host "172.31.37.167", user "airbyte_user", database "temporal_db", no encryption.

Group 2

Experiment 2-1

temporal:
  extraEnv:
    - name: SKIP_DB_CREATE
      value: "true"
    - name: DBNAME
      value: temporal_db
    - name: VISIBILITY_DBNAME
      value: temporal_visibility_db
    - name: POSTGRES_TLS_ENABLED
      value: "true"
    - name: POSTGRES_TLS_DISABLE_HOST_VERIFICATION
      value: "false"

Whole Airbyte deployed successfully. However, pod "airbyte-temporal" log shows some error in the beginning, I guess it falls back to use non-SSL way to connect

TEMPORAL_ADDRESS is not set, setting it to 172.31.44.204:7233
PostgreSQL started.
Setup PostgreSQL schema.
2024-06-20T04:50:17.808Z  [31mERROR[0m  Unable to connect to SQL database.  {"error": "tls: failed to verify certificate: x509: certificate signed by unknown authority", "logging-call-at": "handler.go:52"}
2024/06/20 04:50:17 Loading config; env=docker,zone=,configDir=config
2024/06/20 04:50:17 Loading config files=[config/docker.yaml]
{"level":"info","ts":"2024-06-20T04:50:17.997Z","msg":"Build info.","git-time":"2024-03-22T16:43:28.000Z","git-revision":"92489dd75f17a2daa0a537278c8b6337f71fd704","git-modified":true,"go-arch":"amd64","go-os":"linux","go-version":"go1.22.1","cgo-enabled":false,"server-version":"1.23.0-rc16","debug-mode":false,"logging-call-at":"main.go:148"}
# ...

Experiment 2-2

temporal:
  extraVolumes:
    - name: airbyte-config-map-volume
      configMap:
        name: production-horizon-airbyte-config-map
  extraVolumeMounts:
    - name: airbyte-config-map-volume
      subPath: amazon-rds-ca-global-bundle.pem
      mountPath: /etc/ssl/certs/amazon-rds-ca-global-bundle.pem
  extraEnv:
    - name: SKIP_DB_CREATE
      value: "true"
    - name: DBNAME
      value: temporal_db
    - name: VISIBILITY_DBNAME
      value: temporal_visibility_db
    - name: POSTGRES_TLS_ENABLED
      value: "true"
    - name: POSTGRES_TLS_DISABLE_HOST_VERIFICATION
      value: "false"
    - name: POSTGRES_TLS_CA_FILE
      value: /etc/ssl/certs/amazon-rds-ca-global-bundle.pem

Whole Airbyte deployed successfully. pod "airbyte-temporal" has no error at all.

Also, I found POSTGRES_TLS_CA_FILE here is acutally optional, as long as the PEM file at /etc/ssl/certs/amazon-rds-ca-global-bundle.pem, then it is good - no "failed to verify certificate: x509: certificate signed by unknown authority" error. And I found all other CA pem files are at this folder as well.

Group 3

Group 3 experiments are done together with Rob Holland from Temporal team. Slack conversation is here.

Experiment 3-1

temporal:
  extraEnv:
    - name: SKIP_DB_CREATE
      value: "true"
    - name: DBNAME
      value: temporal_db
    - name: VISIBILITY_DBNAME
      value: temporal_visibility_db
    - name: POSTGRES_TLS_ENABLED
      value: "true"
    - name: POSTGRES_TLS_DISABLE_HOST_VERIFICATION
      value: "true"

Pod "airbyte-temporal" log shows connecting to RDS successfully in the beginning then failed with

sql schema version compatibility check failed: pq: no pg_hba.conf entry for host "172.31.41.182", user "airbyte_user", database "temporal_db", no encryption

Same log to "Experiment 1-3".

Experiment 3-3

temporal:
  extraEnv:
    - name: SKIP_DB_CREATE
      value: "true"
    - name: DBNAME
      value: temporal_db
    - name: VISIBILITY_DBNAME
      value: temporal_visibility_db
    - name: POSTGRES_TLS_ENABLED
      value: "true"
    - name: POSTGRES_TLS_DISABLE_HOST_VERIFICATION
      value: "true"
    - name: SQL_TLS
      value: "true"
    - name: SQL_TLS_DISABLE_HOST_VERIFICATION
      value: "true"
    - name: SQL_TLS_ENABLED
      value: "true"
    - name: SQL_HOST_VERIFICATION
      value: "true"

Pod "airbyte-temporal" log shows connecting to RDS successfully in the beginning then failed at sql schema version compatibility check step. Same log to "Experiment 1-3".

hongbo-miao commented 3 months ago

Got some help from Rob Holland from Temporal team on Slack and thanks!

It may be because of these lines in Airbyte https://github.com/airbytehq/airbyte-platform/blob/686cdb20a42865cea557871020ecbd44ca8ef8e1/airbyte-temporal/scripts/update-and-start-temporal.sh#L50-L55

If you see latest Temporal code https://github.com/temporalio/docker-builds/blob/40955e0f772939045dc7830c20f704149d9e81c7/docker/auto-setup.sh#L208-L293

They allow to pass these parameters

POSTGRES_TLS_ENABLED
POSTGRES_TLS_DISABLE_HOST_VERIFICATION
POSTGRES_TLS_CERT_FILE
POSTGRES_TLS_KEY_FILE
POSTGRES_TLS_CA_FILE
POSTGRES_TLS_SERVER_NAME

when calling temporal-sql-tool setup-schema. But Airbyte code does not provide these parameters.

This explains when I pass CA pem file, the pod "airbyte-temporal" log shows connecting to RDS successfully in the beginning then failed at sql schema version compatibility check step.

marcosmarxm commented 3 months ago

Thanks for the detailed issue @hongbo-miao I added it to the platform team to implement in the future sprints.

Hesperide commented 2 months ago

Following up on this, after a few internal tests we've validated that adding the following values should resolve this issue:

  extraEnv:
    - name: POSTGRES_TLS_ENABLED
      value: "true"
    - name: POSTGRES_TLS_DISABLE_HOST_VERIFICATION
      value: "true"
    - name: SQL_TLS_ENABLED
      value: "true"
    - name: SQL_TLS_DISABLE_HOST_VERIFICATION
      value: "true"

We'll also look to resolve this so this is automatic, and you no longer need to manually inject these in the future.

hongbo-miao commented 2 months ago

Following up on this, after a few internal tests we've validated that adding the following values should resolve this issue:

  extraEnv:
    - name: POSTGRES_TLS_ENABLED
      value: "true"
    - name: POSTGRES_TLS_DISABLE_HOST_VERIFICATION
      value: "true"
    - name: SQL_TLS_ENABLED
      value: "true"
    - name: SQL_TLS_DISABLE_HOST_VERIFICATION
      value: "true"

We'll also look to resolve this so this is automatic, and you no longer need to manually inject these in the future.

Thanks @Hesperide unfortunately, this still not work for us. 🥲 I have all these 3 values set in "Experiment 3-3" above and still failed if you read closely. Has your RDS's force SSL enabled by rds.force_ssl=1? If not enabled, it will work for sure.

I explained the reason why it may not work at https://github.com/airbytehq/airbyte/issues/39636#issuecomment-2182034537. Basically Airbyte's for temporal-sql-tool setup-schema code is out-of-date, it needs to update latest temporal-sql-tool setup-schema code which is why during provision, Termporal failed to connect to RDS.

Do you mind re-opening it? Thank you!