Open NAjustin opened 3 months ago
+1 This happened for me as well, both going from 0.50.54 to 0.63.11 and also going from 0.63.8 to 0.63.11.
Role
and RoleBinding
were both deleted and not recreated. Upgrade --debug
output:
client.go:486: [debug] Starting delete for "airbyte-admin" ServiceAccount
client.go:142: [debug] creating 1 resource(s)
client.go:486: [debug] Starting delete for "airbyte-admin-binding" RoleBinding
client.go:142: [debug] creating 1 resource(s)
client.go:486: [debug] Starting delete for "airbyte-admin-role" Role
client.go:142: [debug] creating 1 resource(s)
The --debug
output did include the airbyte/templates/serviceaccount.yaml
template output for both the Role and RoleBinding, so I was able to kubectl create
the resources from that, but upgrade seems broken.
I'm running on GKE, helm version:
version.BuildInfo{Version:"v3.13.1", GitCommit:"3547a4b5bf5edb5478ce352e18858d8a552a4110", GitTreeState:"clean", GoVersion:"go1.21.3"}
helm search repo airbyte/airbyte
:
airbyte/airbyte 0.363.0 0.63.11 Helm chart to deploy airbyte```
I'm guessing this is the offending change since it now runs pre-upgrade
now but before was only pre-install
:
https://github.com/airbytehq/airbyte-platform/compare/v0.63.10...v0.63.11#diff-d0874cce592344af301414d17a2b74f107d9291a26f0205749fc8ac218ae2457
. . . but I'm seeing the same output as @ryanschwartz which shows the delete and create of 3 resources, but only the ServiceAccount
object actually gets created (not the Role
and RoleBinding
)
@airbytehq/platform-deployments can someone take a look into this issue?
@NAjustin Thanks for reporting this. I'm going to test the fix and hopefully have something to get out shortly
We're still investigating a proper fix, but as a potential workaround, running kubectl rollout restart deployment <release-name>-worker -n <namespace>
after the problematic helm upgrade
may get things working again by forcing a new worker pod to spin up with the recreated service account.
@ryanschwartz and @NAjustin if you give that a try and it gets things working again, do let us know as that will help us work out a proper solution!
@pmossman that will only restart the worker pod - manual intervention was needed to recreate the Role and RoleBinding for me, at which point the worker began functioning as expected.
Baffled and disappointed how this is not a higher priority. This happened to us as well when upgrading and completely broke things. Had to fix it manually, the way @ryanschwartz suggested.
For those who come after us, this is the yaml I've applied by hand, adjust values as needed:
# Source: airbyte/templates/serviceaccount.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: airbyte-admin-role
namespace: airbyte
labels:
helm.sh/chart: airbyte-0.551.0
app.kubernetes.io/name: airbyte
app.kubernetes.io/instance: airbyte
app.kubernetes.io/version: "0.64.3"
app.kubernetes.io/managed-by: Helm
annotations:
helm.sh/hook: pre-install
helm.sh/hook-weight: "-5"
rules:
- apiGroups: ["*"]
resources: ["jobs", "pods", "pods/log", "pods/exec", "pods/attach", "secrets"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"] # over-permission for now
---
# Source: airbyte/templates/serviceaccount.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: airbyte-admin-binding
namespace: airbyte
labels:
helm.sh/chart: airbyte-0.551.0
app.kubernetes.io/name: airbyte
app.kubernetes.io/instance: airbyte
app.kubernetes.io/version: "0.64.3"
app.kubernetes.io/managed-by: Helm
annotations:
helm.sh/hook: pre-install
helm.sh/hook-weight: "-3"
roleRef:
apiGroup: ""
kind: Role
name: airbyte-admin-role
subjects:
- kind: ServiceAccount
name: airbyte-admin
For what it's worth, @bgroff posted this comment earlier today in Slack:
We have rolled out a change that should help with the roll binding issue at the end of last week. We have another change that we will be landing in the next few days to make this work better.
This is still an issue at v0.64.7 app version (helm chart version: 0.654.0) (the last before v1)
Fixed with @KTamas' extra resources
This is still an issue at v1.1.0 app version (helm chart version: 1.1.0)
Fixed with @KTamas extra resources.
For those unfamiliar with applying changes like that, you need to:
kubectl apply -n default -f modules/helm/airbyte_temp/sa-roles.yml
Some people fear dying, others fear living... I fear airbyte upgrades
@perangel Any update on this from the Airbyte side?
(I think a few of the issues people are running into in the Community Slack are also related, but they just don't know k8s well enough to articulate the problem.)
@gavin-ob Ouch (and here I told @marcosmarxm it was getting better 😂):
Some people fear dying, others fear living... I fear airbyte upgrades
Same issue with upgrade to 1.1.1
, thanks for @gavin-ob and @KTamas for describing the suggestion! When modifying the suggestion labels and annotations can be completely deleted they are not relevant to the success.
It's been over two months since I posted that snippet, and this is getting really embarrassing for Airbyte, in my opinion, anyways.
Helm Chart Version
0.350.0
What step the error happened?
Upgrading the Platform or Helm Chart
Relevant information
New App version: 0.63.11 Prior App Version: 0.63.9 Platform: GKE (Autopilot cluster)
Everything upgraded fine, but when trying to check or sync connections, we started getting errors like
Error from server (Forbidden): pods is forbidden: User "system:serviceaccount:airbyte-ns:REDACTED" cannot list resource "pods" in API group "" in the namespace "airbyte-ns"
,Guest attributes endpoint access is disabled
, and"403 Forbidden" for request "PUT http://metadata.google.internal/computeMetadata/v1/instance/guest-attributes/guestInventory/Hostname"
It seems similar to some past threads:
I do already have all these set in config:
It seems like there may be a missing role binding or something along those lines. For what it's worth, we're using GKE Autopilot and a non-default service account (meaning not the default one provisioned for the cluster, and also not named
airbyte-admin
.As a workaround, I granted our SA roles/container.clusterAdmin—but it really shouldn't need these permissions to create pods in its own deployments.
I saw another user reported a similar issue in this Slack thread.
Relevant log output
No response