cloudfoundry / cloud_controller_ng

Cloud Foundry Cloud Controller
Apache License 2.0
191 stars 357 forks source link

Support readiness and liveness checks #1706

Closed GFriedrich closed 7 months ago

GFriedrich commented 4 years ago

Currently CloudFoundry supports HTTP health checks which do a simple HTTP check on a specific endpoint. If these checks fails for a specific amount of time, the instance will be restarted.

Unfortunately there are cases where this doesn't help and the application potentially knows about it (e.g. some broken downstream service). Instead of shutting down the running instance, CloudFoundry should stop routing requests to the cell, but leave the cell as it is. Currently CloudFoundry would try to restart the instance for some time but finally leaves the instance in a shutdown state, even though the problem is resolved after the root cause got fixed. Finally CloudFoundry requires some manual intervention to bring the instance back to life.

Therefore I would suggest to give the user the ability to set two different HTTP endpoints as healthchecks - one for readiness and one for liveness. And finally decide what to do depending on the endpoint that has failed.

What do you think?

cf-gitbot commented 4 years ago

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/173546379

The labels on this github issue will be updated when the story is started.

cwlbraa commented 4 years ago

Is there a difference between the way CF healthchecks work today and how they'd work if they were liveness checks?

FWIW I do not believe this is possible with a Diego backend, but it is certainly possible with help from k8s+eirini.

GFriedrich commented 4 years ago

@cwlbraa: Indeed the CF healthcheck and the liveness check would be the same. Even though it would be great if one could configure the port that is used by the HTTP healthcheck. Currently CF will always use the very first application port for HTTP checks and all others will be checked via TCP healthchecks. That somehow forces you to always use the first port as the one that should be HTTP checked.

philippthun commented 7 months ago

Closing this issue as the "[RFC 630] add readiness healthchecks for apps" (https://github.com/cloudfoundry/community/pull/630) has been accepted and the implementation is on its way.