cloudfoundry / eirini

Pluggable container orchestration for Cloud Foundry, and a Kubernetes backend
Apache License 2.0
115 stars 30 forks source link

Enhancement: app readiness should depend on all containers #79

Open rosenhouse opened 5 years ago

rosenhouse commented 5 years ago

we're playing with automatic sidecar injection from Istio.

looks like Eirini considers the app "Running" if only 1 of the containers is running.

we probably want to change it so that all (non-init) containers must be Ready before CF sees the App as running.

would y'all be open to a PR?

cc @tcdowney

related:

cf-gitbot commented 5 years ago

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/169660862

The labels on this github issue will be updated when the story is started.

julz commented 5 years ago

Hey - sounds good to us & a PR would be very welcome!

cf-gitbot commented 5 years ago

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/169674530

The labels on this github issue will be updated when the story is started.

rosenhouse commented 4 years ago

We're working on this and related things under the heading of "get CATS passing when Eirini has Istio sidecars"

Today's issue: In Diego, with multiple containers (system-provided Envoy, or user-provided sidecar), if any of the containers (garden peas) were to crash, then Diego will tear down and reschedule the whole pod.

In Kubernetes, this doesn't appear to be the behavior. Not default. Not even something directly support in a PodSpec -- we'd have to do a bunch of work (extra wiring somehow) to get K8s to mimic the Diego behavior.

It seems the K8s preferred behavior is restart the crashed container in-place, keeping the pod intact.

Does this sound right to folks?

@julz @alex-slynko @JulzDiverse

cc @emalm @zrob

julz commented 4 years ago

I guess off the top of my head the first question is whether you actually need the Diego behaviour. If the sidecar crashes then it'll get restarted, at which point either (a) the main container starts working again or (b) the main container fails its health check, is restarted, works - either way the system is back up and running? Is there a case where we need the whole pod to be torn down if a container fails?

cwlbraa commented 4 years ago

Using liveness checks to determine when to reschedule is actually much better than the Diego Codependent behavior for user provided sidecars, actually.

The user story that comes to mind is when you have a memory hungry APM agent (in Java or Ruby, for example) running next to a lighter weight app. If the APM sidecar exceeds its memory limits, ideally we OOM kill the APM and restart the pod without taking down the app.

The situation that's interesting here is to do in the absence of a user-provided health check... What the cc api calls a "process" type healthcheck. Should Eirini provide a liveness probe that confirms that the main process is running?