Open rosenhouse opened 5 years ago
We have created an issue in Pivotal Tracker to manage this:
https://www.pivotaltracker.com/story/show/169660862
The labels on this github issue will be updated when the story is started.
Hey - sounds good to us & a PR would be very welcome!
We have created an issue in Pivotal Tracker to manage this:
https://www.pivotaltracker.com/story/show/169674530
The labels on this github issue will be updated when the story is started.
We're working on this and related things under the heading of "get CATS passing when Eirini has Istio sidecars"
Today's issue: In Diego, with multiple containers (system-provided Envoy, or user-provided sidecar), if any of the containers (garden peas) were to crash, then Diego will tear down and reschedule the whole pod.
In Kubernetes, this doesn't appear to be the behavior. Not default. Not even something directly support in a PodSpec -- we'd have to do a bunch of work (extra wiring somehow) to get K8s to mimic the Diego behavior.
It seems the K8s preferred behavior is restart the crashed container in-place, keeping the pod intact.
Does this sound right to folks?
@julz @alex-slynko @JulzDiverse
cc @emalm @zrob
I guess off the top of my head the first question is whether you actually need the Diego behaviour. If the sidecar crashes then it'll get restarted, at which point either (a) the main container starts working again or (b) the main container fails its health check, is restarted, works - either way the system is back up and running? Is there a case where we need the whole pod to be torn down if a container fails?
Using liveness checks to determine when to reschedule is actually much better than the Diego Codependent behavior for user provided sidecars, actually.
The user story that comes to mind is when you have a memory hungry APM agent (in Java or Ruby, for example) running next to a lighter weight app. If the APM sidecar exceeds its memory limits, ideally we OOM kill the APM and restart the pod without taking down the app.
The situation that's interesting here is to do in the absence of a user-provided health check... What the cc api calls a "process" type healthcheck. Should Eirini provide a liveness probe that confirms that the main process is running?
we're playing with automatic sidecar injection from Istio.
looks like Eirini considers the app "Running" if only 1 of the containers is running.
we probably want to change it so that all (non-init) containers must be Ready before CF sees the App as running.
would y'all be open to a PR?
cc @tcdowney
related:
72