Open joeybenamy opened 2 months ago
@joeybenamy what was the previous version you're using?
@joeybenamy what was the previous version you're using?
0.344.2
@airbytehq/platform-deployments fyi
This may have been fixed by ~https://github.com/airbytehq/airbyte-platform/commit/57319f7ebc8626ca93b600e6c593e78fd24a705d~ (oops, wrong link) https://github.com/airbytehq/airbyte-platform/commit/2ed01e554d576bd60011583ea988aeac8980f2f0
Seems like duplicate of https://github.com/airbytehq/airbyte/issues/28389 @abuchanan-airbyte thank you, awaiting for the release!
Seems like duplicate of #28389 @abuchanan-airbyte thank you, awaiting for the release!
Likewise. Thank you!
Testing with Helm Chart 1.1.0 and Airbyte platform 1.1.0. Tolerations are still not present on job pods.
@abuchanan-airbyte and @tryangul fyi
@abuchanan-airbyte and @tryangul fyi
Any update on this? Is this a Helm Chart issue or an Airbyte platform issue?
This is a work in progress @joeybenamy. Hope to get update EOW.
I am also facing an issue with this. Can someone please confirm if it's fixed now?
A fix has been merged to the default branch as far as I've seen, however this is not available yet as part of a release.
We internally built an image of workload-launcher from v1.1.0 with the fix cherry-picked. I can see the tolerations being propagated to the pod when using our custom image.
See: https://github.com/airbytehq/airbyte/issues/28389#issuecomment-2446514393
You might try the latest nightly release version 1.1.0-dev-nightly-1730243169-7e1b11aeac
(that's a helm chart version)
You might try the latest nightly release version
1.1.0-dev-nightly-1730243169-7e1b11aeac
(that's a helm chart version)
Anyone tried this release version with setting global.jobs.kube.tolerations
on a cluster where all nodes are tainted? Tried on both aws eks and a local kind cluster and cannot not get a "rce-postgres-check-" (new source) job pod to get scheduled on either.
Just got an update from Airbyte:
We're working on setting up the 1.2.0 release candidate today. Not sure what the official release date is, but it will be soon. In the meantime, nightly releases are available
You might try the latest nightly release version
1.1.0-dev-nightly-1730243169-7e1b11aeac
(that's a helm chart version)Anyone tried this release version with setting
global.jobs.kube.tolerations
on a cluster where all nodes are tainted? Tried on both aws eks and a local kind cluster and cannot not get a "rce-postgres-check-" (new source) job pod to get scheduled on either.
I figured my issue with those jobs tolerations: the helm chart values expect the operator to be set explicitly, which should not be necessary: https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/ "The default value for operator is Equal."
This is coming from https://github.com/airbytehq/airbyte-platform/commit/2ca3c4192793b15a1ccc2bfd644dd725c3a2903c#diff-3555dc77946bb010495d4a97b3060553f759452f781e0f5b54b3c8a37394c3b0R227 that was linked in https://github.com/airbytehq/airbyte/issues/28389.
In Airbyte 1.2.0 and Helm chart 1.2.0, this issue appears to be fixed, but now using S3 for logs and state seems to be broken: https://github.com/airbytehq/airbyte/issues/48407
So I'm still stuck.
Helm Chart Version
1.0.0
What step the error happened?
On deploy
Relevant information
On prior versions of the Helm Chart, tolerations set in Helm values are properly propagated to the job pods. In the new version, the tolerations in Helm values are not added to the job pods. As a result, our jobs cannot be scheduled.
In Helm values:
From the job pods:
Relevant log output