airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
15.59k stars 4.01k forks source link

Kubernetes: sync jobs failing because of invalid orchestrator repl spec #18040

Open coadan opened 1 year ago

coadan commented 1 year ago

All of our sync jobs fail in our K8s cluster after upgrading from 0.40.6 -> 0.40.14 with the following error in the orchestrator-repl-job:

Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://10.116.0.1/api/v1/namespaces/dev/pods. Message: Pod "orchestrator-repl-job-11356-attempt-0" is invalid: [spec.volumes[1].secret.secretName: Required value, spec.containers[0].volumeMounts[1].name: Not found: "airbyte-secret", spec.containers[0].volumeMounts[1].mountPath: Required value].

The full log might reveal some better context: logs-11342.txt

What's missing in the setup for that error to be appearing?

EDIT:

  1. Note we are using the stable-with-resource-limits kube overlay to deploy the Airbyte applications.
  2. After some investigation into the source, I'm thinking it's related to this? https://github.com/airbytehq/airbyte/pull/10168/files

Maybe my .env need to be added to, but what should the values be then?

EDIT2: Disabling CONTAINER_ORCHESTRATOR_ENABLED seems to make stuff work again. Since it's enabled in the default .env, is this expected to work outside your infra?

dis-sid commented 1 year ago

I had the same problem deploying stable-with-resource-limits kustomize overlay on our GKE cluster. I tried to provide CONTAINER_ORCHESTRATOR_SECRET_NAME with valid gcp credentials without success. So I disabled it. Is it only to launch the orchestrator in a separate pod than the worker ? Since my syncs are still happening it looks like I still have "orchestration" working somehow :thinking:

artusiep commented 1 year ago

I have the same issue on version 0.40.18 when I set GOOGLE_APPLICATION_CREDENTIALS=/secrets/gcs-log-creds/gcp.json in .secret file of my kustomize overlays. I was trying to change in kustomization.yaml something:

...
secretGenerator:
  - name: airbyte-secrets
    env: .secrets
...

to

secretGenerator:
  - name: airbyte-secrets
    env: .secrets
  - name: airbyte-secret
    env: .secrets

but it didn't helped. I do not fully understand what orchestrator is doing but it seems here are typos: https://github.com/airbytehq/airbyte/blob/a586537adb7eeefd03b9ec8b0adb444493346ddf/airbyte-commons-worker/src/main/java/io/airbyte/workers/process/AsyncOrchestratorPodProcess.java#L281-L293

c-p-b commented 1 year ago

@coadan can you help us understand if this is still an issue? If so, can you share what version of kubernetes you are using?

mjerzyk-surfer commented 7 months ago

Hey, I'm facing the same issue since I started using GCS bucket as a log storage. Seems like no volumes are mounted on orchestrator pod and the only way to overcome the issue is to disable orchestrator. Do you have any update on that issue?

dis-sid commented 7 months ago

I've just upgraded to airbyte chart version 0.53.52, and having the GCS secrets mounted in the orchestrator container is still a problem. I disabled the worker's container orchestrator again.

Note to users who configured GCS for storing logs: you'll need to disable the orchestrator until this is resolved and the variable worker.containerOrchestrator.enabled doesn't exist anymore (removed from 0.50.34 onwards), to do it use the extraEnv field like so:

worker:
  extraEnv:
    - name: CONTAINER_ORCHESTRATOR_ENABLED
      value: "false"
tdebroc commented 7 months ago

Hello, we encounter the same issue and did the same fix that @dis-sid (exactly in the same time you commented @dis-sid ) We hope to a better solution

LucasSegersFabro commented 7 months ago

I found a solution for running the orchestrator with GCS log/state buckets on airbyte's slack (by Louis Auenau).

The key was setting these two (previously unknown to me) environment variables on the worker deployment:

Setting those two enabled me to turn the orchestrator on again

Don't know if related but also upgraded airbyte's helm chart to 0.53.196

mjerzyk-surfer commented 7 months ago

thanks @LucasSegersFabro, it works

arkapravasinha commented 2 months ago

@mjerzyk-surfer , Can you please help me with what values you have passed in those two environment variables