airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
16.25k stars 4.15k forks source link

[helm] Using GCS storage without an existing credentials secret results in broken configuration #48502

Open reidab opened 1 week ago

reidab commented 1 week ago

Helm Chart Version

airbyte-1.2.0

What step the error happened?

During the Sync

Relevant information

When using GCS storage, the Helm chart supports configuring credentials either using an existing k8s secret passed in global.storage.storageSecretName or by setting global.storage.gcs.credentialsJson to the base-64 encoded JSON of the credentials file.

global:
  storage:
    type: gcs
    # storageSecretName: ""

    # GCS
    bucket:
      log: foo-airbyte
      state: foo-airbyte
      workloadOutput: foo-airbyte
    gcs:
      authenticationType: "credentials"
      projectId: foo-project
      credentialsJson: <base64-encoded credentials JSON>

If you pass the credentials JSON directly, this template will render a secret called {{ .Release.Name }}-gcs-log-creds: https://github.com/airbytehq/airbyte-platform/blob/4ece0a2719873d4ca28c44362b363ba2786381c7/charts/airbyte/templates/gcs-log-creds-secret.yaml#L1-L6

This secret is only created if the following conditional is true:

{{- if and 
  (eq .Values.global.deploymentMode "oss") 
  (eq (lower (default "" .Values.global.storage.type)) "gcs") 
  (not .Values.global.storage.secretName)
  (not .Values.global.storage.storageSecretName)
}}

This means that if global.storage.storageSecretName is set, this -gcs-log-creds secret will not be created, leaving you with two options:

  1. Create a secret manually and set global.storage.storageSecretName
  2. Leave global.storage.storageSecretName unset and pass credentials to have the -gcs-log-creds secret created automatically.

Jumping over to the deployment templates for the worker and workload launcher pods, both of them define a CONTAINER_ORCHESTRATOR_SECRET_NAME using the airbyte.secretStoreName helper pointed to global.storage.storageSecretName.

If global.storage.storageSecretName is not set, airbyte.secretStoreName will fall back to a value of airbyte-config-secrets.


Given all that, if you try to run using GCS storage with the configuration at the top of this post — with credentials passed to global.storage.gcs.credentialsJson and global.storage.storageSecretName unset — you end up with your credentials stored in a secret called {deployment-name}-gcs-log-creds, but your worker and launcher pods referencing a secret called airbyte-config-secrets that does not exist. This causes all worker pods to fail to start.

It seems like the two options are:

  1. Fix the default behavior in the worker/launcher pods so that it falls back to {deployment-name}-gcs-log-creds instead of airbyte-config-secrets when global.storage.storageSecretName is not set.
  2. Update the GCS storage configuration so that it uses global.storage.storageSecretName to set the name of the secret it creates instead of always using {deployment-name}-gcs-log-creds.

Relevant log output

reidab commented 1 week ago

For anyone else who finds this, I've temporarily worked around it by explicitly adding extraEnv values in the worker and launcher configuration to override the incorrectly-set env var.

  extraEnv:
    - name: CONTAINER_ORCHESTRATOR_SECRET_NAME
      value: airbyte-gcs-log-creds
marcosmarxm commented 4 days ago

@airbytehq/platform-deployments can someone take a look into this?

edwandr commented 1 day ago

Ran into the same issue, thanks @reidab for the workaround which worked for me also !