cloudfoundry-incubator / kubecf

Cloud Foundry on Kubernetes
Apache License 2.0
115 stars 62 forks source link

helm upgrade fail in multi-az kubecf #1662

Closed ShuangMen closed 3 years ago

ShuangMen commented 3 years ago

Describe the bug When try to upgrade kubecf from v2.6.1 to v2.7.1, with multi-az enabled, the upgrade failed.

$ k get pod -n kubecf
NAME                                     READY   STATUS      RESTARTS   AGE
api-z0-0                                 17/17   Running     1          30d
api-z1-0                                 17/17   Running     93         30d
auctioneer-0                             6/6     Running     70         30d
bosh-dns-55f949b56d-6vbbq                1/1     Running     0          4d17h
bosh-dns-55f949b56d-tgg5w                1/1     Running     0          4d17h
cc-worker-z0-0                           6/6     Running     0          30d
cc-worker-z1-0                           6/6     Running     48         30d
cf-apps-dns-dcb9687ff-g6k2x              1/1     Running     0          30d
coredns-quarks-6db68476bd-ks6cj          1/1     Running     0          36m
coredns-quarks-6db68476bd-pz5dn          1/1     Running     0          36m
database-0                               2/2     Running     0          30d
database-seeder-7a19efc54ebbb714-pqtbg   0/2     Completed   0          30d
database-seeder-d49344d80353dd73-gmljj   0/2     Completed   0          30d
diego-api-z0-0                           9/9     Running     3          21d
diego-api-z1-0                           9/9     Running     8          21d
diego-cell-z0-0                          12/12   Running     2          30d
diego-cell-z1-0                          12/12   Running     2          30d
doppler-z0-0                             6/6     Running     0          30d
doppler-z1-0                             6/6     Running     0          30d
kubecf-pre-upgrade-hook-2t2f4            0/1     Error       0          21m
kubecf-pre-upgrade-hook-56b6h            0/1     Error       0          19m
kubecf-pre-upgrade-hook-7qht2            0/1     Error       0          22m
kubecf-pre-upgrade-hook-7rq7q            0/1     Error       0          16m
kubecf-pre-upgrade-hook-grgx8            0/1     Error       0          22m
kubecf-pre-upgrade-hook-nxxlz            0/1     Error       0          20m
kubecf-pre-upgrade-hook-xpsjz            0/1     Error       0          21m
log-api-z0-0                             9/9     Running     6          30d
log-api-z1-0                             9/9     Running     6          30d
log-cache-0                              10/10   Running     210        30d
nats-z0-0                                7/7     Running     0          30d
nats-z1-0                                7/7     Running     0          30d
router-z0-0                              7/7     Running     0          30d
router-z1-0                              7/7     Running     0          30d
scheduler-z0-0                           12/13   Running     197        30d
scheduler-z1-0                           12/13   Running     210        30d
singleton-blobstore-z0-0                 8/8     Running     0          30d
uaa-z0-0                                 8/8     Running     0          30d
uaa-z1-0                                 8/8     Running     0          30d
$ k logs kubecf-pre-upgrade-hook-2t2f4 -n kubecf
+ shopt -s nullglob
+ for f in /hooks/*.sh
+ bash /hooks/remove-deployment-updater-readiness.sh
+ patch='
---
spec:
  template:
    spec:
      containers:
      - name: cc-deployment-updater-cc-deployment-updater
        readinessProbe: ~
'
+ kubectl patch statefulset --namespace kubecf scheduler --patch '
---
spec:
  template:
    spec:
      containers:
      - name: cc-deployment-updater-cc-deployment-updater
        readinessProbe: ~
'
Error from server (NotFound): statefulsets.apps "scheduler" not found
+ exit 1
$ k get statefulset -n kubecf
NAME                     READY   AGE
api-z0                   1/1     31d
api-z1                   1/1     31d
auctioneer               1/1     31d
cc-worker-z0             1/1     31d
cc-worker-z1             1/1     31d
database                 1/1     31d
diego-api-z0             1/1     31d
diego-api-z1             1/1     31d
diego-cell-z0            1/1     31d
diego-cell-z1            1/1     31d
doppler-z0               1/1     31d
doppler-z1               1/1     31d
log-api-z0               1/1     31d
log-api-z1               1/1     31d
log-cache                1/1     31d
nats-z0                  1/1     31d
nats-z1                  1/1     31d
router-z0                1/1     31d
router-z1                1/1     31d
scheduler-z0             0/1     31d
scheduler-z1             0/1     31d
singleton-blobstore-z0   1/1     31d
uaa-z0                   1/1     31d
uaa-z1                   1/1     31d

There is no statefulset named scheduler in multi-az kubecf.

To Reproduce run helm upgrade to v2.7.1 with multi-az.

Expected behavior helm upgrade successfully.

Environment

Additional context Add any other context about the problem here.

ShuangMen commented 3 years ago

https://github.com/cloudfoundry-incubator/kubecf/blob/8e6f60285e9ef3f065e604a82ba9914130805ba1/chart/hooks/pre-upgrade/remove-deployment-updater-readiness.sh#L22 This is hard code to statefulset 'scheduler'

ShuangMen commented 3 years ago

after update the script to list and find all the scheduler statefulset, meet another error

$ k logs kubecf-pre-upgrade-hook-6qm5j -n kubecf
+ shopt -s nullglob
+ for f in /hooks/*.sh
+ bash /hooks/remove-deployment-updater-readiness.sh
+ patch='
---
spec:
  template:
    spec:
      containers:
      - name: cc-deployment-updater-cc-deployment-updater
        readinessProbe: ~
'
++ cut -d ' ' -f 1
++ grep scheduler
++ kubectl get statefulsets --namespace kubecf
Error from server (Forbidden): statefulsets.apps is forbidden: User "system:serviceaccount:kubecf:pre-upgrade-helm-hook" cannot list resource "statefulsets" in API group "apps" in the namespace "kubecf"
+ scheduler_list=
+ exit 1

need to add account with list access of app statefulsets.

ShuangMen commented 3 years ago

fix code merged, close this issue.