cloudfoundry-incubator / quarks-operator

BOSH releases deployed on Kubernetes
https://www.cloudfoundry.org/project-quarks/
Apache License 2.0
49 stars 35 forks source link

fail to deploy kubecf 2.5.8 with multi_az enabled #1206

Closed ShuangMen closed 4 years ago

ShuangMen commented 4 years ago

Describe the bug Try to deploy kubecf 2.5.8 with multi_az enabled, only one zone pods get running, the others fail with error "no job instance found for spec index '10000'".

$ kubectl get pod -n kubecf
NAME                                     READY   STATUS                  RESTARTS   AGE
api-z0-0                                 17/17   Running                 1          123m
api-z1-0                                 0/17    Init:CrashLoopBackOff   28         123m
auctioneer-z0-0                          6/6     Running                 1          123m
auctioneer-z1-0                          0/6     Init:CrashLoopBackOff   28         123m
bosh-dns-5bb6c47fb7-pfmbc                1/1     Running                 0          127m
bosh-dns-5bb6c47fb7-sw5vm                1/1     Running                 0          127m
cc-worker-z0-0                           6/6     Running                 0          123m
cc-worker-z1-0                           0/6     Init:CrashLoopBackOff   27         123m
cf-apps-dns-68f75c9f4b-67kl8             1/1     Running                 0          128m
cf-apps-dns-68f75c9f4b-hxfjp             1/1     Running                 0          128m
database-0                               2/2     Running                 0          126m
database-seeder-3bde40fd54eee0ab-k4kk2   0/2     Completed               0          127m
diego-api-z0-0                           9/9     Running                 2          123m
diego-api-z1-0                           0/9     Init:CrashLoopBackOff   28         123m
doppler-z0-0                             6/6     Running                 0          123m
doppler-z1-0                             0/6     Init:CrashLoopBackOff   28         123m
log-api-z0-0                             9/9     Running                 0          123m
log-api-z1-0                             0/9     Init:CrashLoopBackOff   28         123m
log-cache-z0-0                           10/10   Running                 0          123m
log-cache-z1-0                           0/10    Init:CrashLoopBackOff   27         123m
nats-z0-0                                7/7     Running                 0          123m
nats-z1-0                                0/7     Init:CrashLoopBackOff   28         123m
router-z0-0                              7/7     Running                 0          123m
router-z1-0                              0/7     Init:CrashLoopBackOff   28         123m
scheduler-z0-0                           13/13   Running                 1          123m
scheduler-z1-0                           0/13    Init:CrashLoopBackOff   27         123m
singleton-blobstore-0                    8/8     Running                 0          123m
uaa-z0-0                                 8/8     Running                 0          123m
uaa-z1-0                                 0/8     Init:CrashLoopBackOff   28         123m

$kubectl logs -f nats-z1-0 template-render  -n kubecf
+ cf-operator util template-render
2020/10/26 05:18:21 no job instance found for spec index '10000'
real    0m5.385s
user    0m0.299s
sys 0m0.091s

To Reproduce try to deploy cf-operator(6.1.17) then kubecf(2.5.8) with multi_az enabled.

Expected behavior After the deployments, all the pods should be Running status.

Environment cf-operator 6.1.17 kubecf 2.5.8

Additional context Add any other context about the problem here.

similar issue also create https://github.com/cloudfoundry-incubator/kubecf/issues/1512

cf-gitbot commented 4 years ago

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/175433106

The labels on this github issue will be updated when the story is started.

ShuangMen commented 4 years ago

pull request for this issue https://github.com/cloudfoundry-incubator/quarks-operator/pull/1210

manno commented 4 years ago

Hopefully fixed by the merged PR