airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
16.2k stars 4.14k forks source link

Platform Issue: temporal pod crashing on Kubernetes for new deployment #25815

Closed bkonicek-calm closed 6 months ago

bkonicek-calm commented 1 year ago

Platform Version

0.44.0

What step the error happened?

On deploy

Revelant information

I'm deploying Helm chart version 0.45.11 with an external GCP Postgres Database. I've already successfully deployed this to other environments with the exact same chart version and same values.

Now, my temporal pod is failing to start, which causes the deployment to fail.

When I start a debug pod and copy the temporal pod to it, I can see that there is no config/docker.yaml file. Since the image tag for the temporal container is the same as in all my other clusters, what could be causing the config file to not be generated?

Relevant log output

+ temporal-sql-tool --plugin postgres --ep 10.24.0.24 -u airbyte -p 5432 create --db temporal
2023-05-04T18:48:49.897Z    ERROR   Unable to create SQL database.  {"error": "pq: database \"temporal\" already exists", "logging-call-at": "handler.go:97"}
2023/05/04 18:48:49 Loading config; env=docker,zone=,configDir=config
2023/05/04 18:48:49 Loading config files=[config/docker.yaml]
Unable to load configuration: config file corrupted: yaml: line 17: found unknown escape character.
kijewskimateusz commented 1 year ago

Hello Everyone,

I am having very similar issue on my end. I am too trying to deploy Airbyte using GCP Cloud-SQL database and after successful bootloader, my temporal is failing with logs:

+ temporal-sql-tool --plugin postgres --ep 172.27.5.3 -u airbyte -p 5432 create --db temporal
2023-05-04T18:47:04.590Z    ERROR    Unable to create SQL database.    {"error": "pq: database \"temporal\" already exists", "logging-call-at": "handler.go:97"} 
2023/05/04 18:47:04 Loading config; env=docker,zone=,configDir=config
2023/05/04 18:47:04 Loading config files=[config/docker.yaml]
Unable to load configuration: config file corrupted: yaml: line 29: did not find expected key. 
bkonicek-calm commented 1 year ago

Small update: I tried deploying this to a local kubernetes cluster (docker desktop) and it started just fine. It seems as if there might be an issue with running against an external database where it doesn't properly handle the temporal table already existing? Although I'm not sure how the table got there in the first place...

rowanmoul commented 1 year ago

The startup script provides no way to skip db creation, and it doesn't check if the db exists before trying to create it. https://github.com/airbytehq/airbyte-platform/blob/main/airbyte-temporal/scripts/update-and-start-temporal.sh

I am also running into this issue trying to deploy on Azure Kubernetes with an external Azure PostgreSQL database (not running in the cluster or created with the airbyte helm chart)

The temporal auto-setup script has an option to skip db creation: https://github.com/temporalio/docker-builds/blob/main/docker/auto-setup.sh

bkonicek-calm commented 1 year ago

@rowanmoul interesting find. It's odd because I didn't do anything differently on my previous deployments, and it didn't have this problem. Unfortunately I'm not able to see that far back in the logs to confirm whether or not they also tried to initialize the database.

The other thing that's confusing me is what happens if the temporal pod restarts - won't it try to create the database again?

rowanmoul commented 1 year ago

Further digging has uncovered what appears to be the root cause of the problem (though it doesn't explain why more people didn't encounter this issue before now). airbyte-temporal is using a rather ancient version of Temporal from 2021 (1.13.0) which does not include a temporal-sql-tool which fails gracefully when the database already exists. Compare 1.13.0 to the only marginally more recent 1.14.0 (current latest version is 1.20.2)

Also, I noticed that while this module exists (and the image that contains it), it doesn't appear to be used with the airbyte helm chart, which just references airtbyte/temporal-auto-setup directly (see here and here). Using the airbyte/temporal image seems to allow temporal to startup correctly, despite the errors about the database already existing. I added this to my values yaml:

        temporal:
          image:
            repository: airbyte/temporal
            tag: 0.44.4
rowanmoul commented 1 year ago

It looks like the Airbyte team have resolved this by updating to a newer Temporal version and using the upstream container image directly, so the above is not needed as of Helm Chart version 47.11 (release 50.11) However, if you don't give your db user database creation privileges then you need to set this environment variable to prevent temporal from trying to create the db:

temporal:
  extraEnv:
    - name: SKIP_DB_CREATE
      value: "true"
davinchia commented 6 months ago

Thanks, closing this as it's old.