Open NAjustin opened 1 month ago
+1 This happened for me as well, both going from 0.50.54 to 0.63.11 and also going from 0.63.8 to 0.63.11.
Role
and RoleBinding
were both deleted and not recreated. Upgrade --debug
output:
client.go:486: [debug] Starting delete for "airbyte-admin" ServiceAccount
client.go:142: [debug] creating 1 resource(s)
client.go:486: [debug] Starting delete for "airbyte-admin-binding" RoleBinding
client.go:142: [debug] creating 1 resource(s)
client.go:486: [debug] Starting delete for "airbyte-admin-role" Role
client.go:142: [debug] creating 1 resource(s)
The --debug
output did include the airbyte/templates/serviceaccount.yaml
template output for both the Role and RoleBinding, so I was able to kubectl create
the resources from that, but upgrade seems broken.
I'm running on GKE, helm version:
version.BuildInfo{Version:"v3.13.1", GitCommit:"3547a4b5bf5edb5478ce352e18858d8a552a4110", GitTreeState:"clean", GoVersion:"go1.21.3"}
helm search repo airbyte/airbyte
:
airbyte/airbyte 0.363.0 0.63.11 Helm chart to deploy airbyte```
I'm guessing this is the offending change since it now runs pre-upgrade
now but before was only pre-install
:
https://github.com/airbytehq/airbyte-platform/compare/v0.63.10...v0.63.11#diff-d0874cce592344af301414d17a2b74f107d9291a26f0205749fc8ac218ae2457
. . . but I'm seeing the same output as @ryanschwartz which shows the delete and create of 3 resources, but only the ServiceAccount
object actually gets created (not the Role
and RoleBinding
)
@airbytehq/platform-deployments can someone take a look into this issue?
@NAjustin Thanks for reporting this. I'm going to test the fix and hopefully have something to get out shortly
We're still investigating a proper fix, but as a potential workaround, running kubectl rollout restart deployment <release-name>-worker -n <namespace>
after the problematic helm upgrade
may get things working again by forcing a new worker pod to spin up with the recreated service account.
@ryanschwartz and @NAjustin if you give that a try and it gets things working again, do let us know as that will help us work out a proper solution!
@pmossman that will only restart the worker pod - manual intervention was needed to recreate the Role and RoleBinding for me, at which point the worker began functioning as expected.
Baffled and disappointed how this is not a higher priority. This happened to us as well when upgrading and completely broke things. Had to fix it manually, the way @ryanschwartz suggested.
For those who come after us, this is the yaml I've applied by hand, adjust values as needed:
# Source: airbyte/templates/serviceaccount.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: airbyte-admin-role
namespace: airbyte
labels:
helm.sh/chart: airbyte-0.551.0
app.kubernetes.io/name: airbyte
app.kubernetes.io/instance: airbyte
app.kubernetes.io/version: "0.64.3"
app.kubernetes.io/managed-by: Helm
annotations:
helm.sh/hook: pre-install
helm.sh/hook-weight: "-5"
rules:
- apiGroups: ["*"]
resources: ["jobs", "pods", "pods/log", "pods/exec", "pods/attach", "secrets"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"] # over-permission for now
---
# Source: airbyte/templates/serviceaccount.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: airbyte-admin-binding
namespace: airbyte
labels:
helm.sh/chart: airbyte-0.551.0
app.kubernetes.io/name: airbyte
app.kubernetes.io/instance: airbyte
app.kubernetes.io/version: "0.64.3"
app.kubernetes.io/managed-by: Helm
annotations:
helm.sh/hook: pre-install
helm.sh/hook-weight: "-3"
roleRef:
apiGroup: ""
kind: Role
name: airbyte-admin-role
subjects:
- kind: ServiceAccount
name: airbyte-admin
Helm Chart Version
0.350.0
What step the error happened?
Upgrading the Platform or Helm Chart
Relevant information
New App version: 0.63.11 Prior App Version: 0.63.9 Platform: GKE (Autopilot cluster)
Everything upgraded fine, but when trying to check or sync connections, we started getting errors like
Error from server (Forbidden): pods is forbidden: User "system:serviceaccount:airbyte-ns:REDACTED" cannot list resource "pods" in API group "" in the namespace "airbyte-ns"
,Guest attributes endpoint access is disabled
, and"403 Forbidden" for request "PUT http://metadata.google.internal/computeMetadata/v1/instance/guest-attributes/guestInventory/Hostname"
It seems similar to some past threads:
I do already have all these set in config:
It seems like there may be a missing role binding or something along those lines. For what it's worth, we're using GKE Autopilot and a non-default service account (meaning not the default one provisioned for the cluster, and also not named
airbyte-admin
.As a workaround, I granted our SA roles/container.clusterAdmin—but it really shouldn't need these permissions to create pods in its own deployments.
I saw another user reported a similar issue in this Slack thread.
Relevant log output
No response