airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
14.86k stars 3.82k forks source link

[platform] Upgrading to v0.63.2 results in continuous failures for all sync jobs. #40546

Open kzvezdarov opened 2 weeks ago

kzvezdarov commented 2 weeks ago

Helm Chart Version

0.233.2

What step the error happened?

During the Sync

Relevant information

I'm in the process of upgrading an Airbyte deployment on GKE Autopilot from v0.50.45 to v0.63.2. Following the migration guide here: https://docs.airbyte.com/deploying-airbyte/on-kubernetes-via-helm#migration-steps, the deployment was upgraded and services started successfully, but syncs fail to run with the following errors:

Deleting all Airbyte deployments and volumes fully and reapplying the chart seems to allow a few syncs to make progress for a few minutes, only to start failing in the same manner. I have not tried resetting the airbyte database, but that's not really an option in any case.

This issue persists on 0.63.1 and 0.60.1; unfortunately upgrading the cluster to 0.63.2 has made rolling back to 0.50.45 impossible, as various SQL queries fail due to missing columns/other backward compatibility issues.

Relevant log output

No response

marcosmarxm commented 2 weeks ago

@airbytehq/platform-move can someone take a look into this? Is this maybe a migration problem not updating the column in the database?

omreego commented 1 week ago

kzvezdarov I ran into the same issue when running airbyte through docker compose on version 0.63.3. I managed to get around it by changing the input to the Worker Container. By default, the value for INTERNAL_API_HOST was airbyte-server:8001, but it works when it's http://airbyte-server:8001. So essentially I added "http://" before the value for INTERNAL_API_HOST as input to the Worker Container (or deployment in k8s) and it solved the issue. It took me 3 hours of trial and error to get there. Hope it helps

kzvezdarov commented 6 days ago

kzvezdarov I ran into the same issue when running airbyte through docker compose on version 0.63.3. I managed to get around it by changing the input to the Worker Container. By default, the value for INTERNAL_API_HOST was airbyte-server:8001, but it works when it's http://airbyte-server:8001. So essentially I added "http://" before the value for INTERNAL_API_HOST as input to the Worker Container (or deployment in k8s) and it solved the issue. It took me 3 hours of trial and error to get there. Hope it helps

Thanks for the suggestion, unfortunately I've already tried that with no effect. It feels like it might be related to some database state (it's a long run deployment, initially created on 0.44.x), as it persists into 0.63.5 regardless of configuration changes.