cloudfoundry-incubator / kubecf

Cloud Foundry on Kubernetes
Apache License 2.0
115 stars 62 forks source link

fix: kubecf upgrade failure due to can't find multi-az scheduler #1663

Closed ShuangMen closed 3 years ago

ShuangMen commented 3 years ago

Description

kubecf with multi-az upgrade failed from version v2.6.1 to v2.7.1.

Motivation and Context

https://github.com/cloudfoundry-incubator/kubecf/issues/1662

How Has This Been Tested?

run helm upgrade kubecf with the updated code, upgrade process moves on and all schedulers get upgrade successfully.

$ k get pod -n kubecf
NAME                                     READY   STATUS                  RESTARTS   AGE
api-z0-0                                 17/17   Running                 1          17m
api-z1-0                                 17/17   Running                 8          17m
auctioneer-0                             6/6     Running                 1          18m
bosh-dns-55f949b56d-6vbbq                1/1     Running                 0          4d20h
bosh-dns-55f949b56d-tgg5w                1/1     Running                 0          4d20h
cc-worker-z0-0                           6/6     Running                 0          18m
cc-worker-z1-0                           6/6     Running                 0          18m
cf-apps-dns-59f9f659f5-t94mh             1/1     Running                 0          27m
coredns-quarks-6db68476bd-ks6cj          1/1     Running                 0          3h32m
coredns-quarks-6db68476bd-pz5dn          1/1     Running                 0          3h32m
database-0                               2/2     Running                 0          27m
database-seeder-7a19efc54ebbb714-pqtbg   0/2     Completed               0          31d
database-seeder-d49344d80353dd73-gmljj   0/2     Completed               0          31d
diego-api-z0-0                           9/9     Running                 2          17m
diego-api-z1-0                           9/9     Running                 2          17m
diego-cell-z0-0                          0/12    Init:CrashLoopBackOff   6          17m
diego-cell-z1-0                          0/12    Init:CrashLoopBackOff   7          17m
doppler-z0-0                             6/6     Running                 0          18m
doppler-z1-0                             6/6     Running                 0          18m
log-api-z0-0                             9/9     Running                 0          17m
log-api-z1-0                             9/9     Running                 0          18m
log-cache-0                              10/10   Running                 0          17m
nats-z0-0                                7/7     Running                 0          18m
nats-z1-0                                7/7     Running                 0          18m
router-z0-0                              7/7     Running                 0          18m
router-z1-0                              7/7     Running                 4          18m
scheduler-z0-0                           12/12   Running                 1          17m
scheduler-z1-0                           12/12   Running                 1          17m
singleton-blobstore-z0-0                 8/8     Running                 0          18m
uaa-z0-0                                 8/8     Running                 0          18m
uaa-z1-0                                 8/8     Running                 0          18m

Screenshots (if appropriate):

Types of changes

Checklist:

ShuangMen commented 3 years ago

hi, could some one review this ? @mook-as

ShuangMen commented 3 years ago

add some check on the existence of scheduler statefulset for the case of multi-az and multi-cluster kubecf.