cloudfoundry / cf-for-k8s

The open source deployment manifest for Cloud Foundry on Kubernetes
Apache License 2.0
300 stars 115 forks source link

Default builder resource sometimes incorrectly reports itself as not ready #638

Closed jspawar closed 2 years ago

jspawar commented 3 years ago

Describe the bug

The default builder resource sometimes incorrectly reports itself as explicitly not ready, in turn causing the kapp deployment to fail.

We believe this is erroneous as well since it doesn't appear to report its Ready status as Unknown to indicate its transient state as is generally recommended, instead using Ready status as False to indicate it's transitioning.

To Reproduce*

Steps to reproduce the behavior:

  1. Deploy cf-for-k8s off develop
  2. Wait for the deployment to finish
  3. Observe the deployment fails with the following error:
    
    ---- applying 1 changes [287/288 done] ----

create builder/cf-default-builder (kpack.io/v1alpha1) namespace: cf-workloads-staging

---- waiting on 8 changes [280/288 done] ----

fail: reconcile builder/cf-default-builder (kpack.io/v1alpha1) namespace: cf-workloads-staging

^ Encountered failure condition Ready == False: (message: stack bionic-stack is not ready)

kapp: Error: waiting on reconcile builder/cf-default-builder (kpack.io/v1alpha1) namespace: cf-workloads-staging:

Finished unsuccessfully (Encountered failure condition Ready == False: (message: stack bionic-stack is not ready))



## Expected behavior
For it to deploy successfully without error

## Additional context
We observed this failure in our automation: https://release-integration.ci.cf-app.com/teams/main/pipelines/cf-for-k8s-main/jobs/validate-external-db/builds/426

We inspected the cluster some time after this failure and observed that both the Builder and related ClusterStack resources were both reporting their status as ready.

Subsequent deployments also all went fine without issue, so appears to be flakey.

### cf-for-k8s SHA 
https://github.com/cloudfoundry/cf-for-k8s/commit/e54d9862e91b5d38922ddde8fa82c0922a209885

### Deploy instructions
Standard deployment

### Cluster information
Provider: GKE
Cluster version: v1.19.7-gke.1500
Kubectl version: v1.15.7

### CLI versions
1. `ytt --version`: v0.31.0
2. `kapp --version`: v0.35.0
3. `kubectl version`: v1.15.7
4. `cf version`: N/A (didn't get that far)
cf-gitbot commented 3 years ago

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/177276050

The labels on this github issue will be updated when the story is started.