Closed jandubois closed 3 years ago
We have created an issue in Pivotal Tracker to manage this:
https://www.pivotaltracker.com/story/show/178014456
The labels on this github issue will be updated when the story is started.
set the stamps for all terminated containers explicitly before counting how many drain scripts have finished
What I mean by this is inserting the logic from the script above in the waitExit
function at this line just after the sleep 5
statement.
I would prefer if the drain scripts were patched to behave better on Kubernetes. We could put a wrapper around rep
to keep it from terminating.
These problems come to mind, when adding cluster inspection to the container 1 drain script:
The drain script is already doing a lot and it feels wrong to put more logic there. We could implement this in Go, to get rid of the dependencies on jq/kubectl. However, how would that work, we add another sidecar container (with the operator image) and run the new pre stop hook there?
I don't think we can securely add get/list pods RBAC to the service account used by the pods.
If we need to check cluster state, we will need to add this to the operator somehow (i.e. a pod with more access).
First thing that comes to my mind is switching to http PreStop hooks. We’d need a small API server to implement the checks.
This approach does not work because kubernetes never switches a container status to terminated
when the main process in the container exits while the PreStop
hook is still running. Instead the hook will be killed and exits with code 137
, but the container status remains running
until the grace period expires, even though there are no processes left running inside the container.
We'll look into an alternate approach by modifying container-run
to not exit until it receives a SIGTERM
signal.
Yes that’s a good description of why this type of exit will not work thanks
As @univ0298 points out in Slack, the drain script support from #1302 is still incomplete:
When the main process of a container exits, then the
preStop
script will not gain back control because there is nothing left to do: the container is already stopped. This is "documented" in Add e2e to verify blocking behavior of preStop hook and PreStop process is not completed when within termination grace period.My suggestion to work around this issue is to explicitly set the stamps for all terminated containers explicitly before counting how many drain scripts have finished. Proof of concept code:
This generates a list of all containers that have a
state.containerStatuses.terminated
field. I then scrape the actual stamp filename out of the preStop commands and make sure the stamp file exists. It is creating stamps for those containers that didn't get a chance to do it themselves.I've tested the script by replacing the condition above with
state.running
and usedecho
instead oftouch
:I know that we still have other issues with drain scripts, but this should get us one step closer.
@manno What do you think? It feels quite hacky, but on the other hand, it should be pretty straight-forward to add; everything else I can think of is much more complicated.
Remaining issues: It looks like we don't currently include
kubectl
in the images, so need to figure out how to make that part work.And I haven't checked if the service account has a role binding that allows
get pod
access from within the containers.