In our productive k8s clusters we observed following problem:
During eviction of the Worker Pod the pre-stop-hook.sh was triggered as expected.
With fly -t infra workers we could verify, that the worker state changed to "retiring".
But suddenly, the state changed back to running, for whatever reason.
Thus, the worker, even if it was in pod state terminating accepted new incoming concourse jobs.
And the worker was running until the pod.spec.terminationGracePeriodSeconds were reached.
Changes proposed in this pull request
With this PR, the pre-stop-hook easily sends the shutdownSignal again and again, assuring that the state remain retiring.
Contributor Checklist
[x] Which branch are you merging into?
dev is for changes related to the next release of Concourse (aka unpublished code on master in concourse/concourse)
Reviewer Checklist
This section is intended for the core maintainers only, to track review progress. Please do not
fill out this section.
[ ] Code reviewed
[ ] Topgun tests run
[ ] Back-port if needed
[ ] Is the correct branch targeted? (master or dev)
Existing Issue
no
Why do we need this PR?
In our productive k8s clusters we observed following problem: During eviction of the Worker Pod the pre-stop-hook.sh was triggered as expected. With
fly -t infra workers
we could verify, that the worker state changed to "retiring". But suddenly, the state changed back torunning
, for whatever reason. Thus, the worker, even if it was in pod stateterminating
accepted new incoming concourse jobs. And the worker was running until the pod.spec.terminationGracePeriodSeconds were reached.Changes proposed in this pull request
retiring
.Contributor Checklist
dev
is for changes related to the next release of Concourse (aka unpublished code onmaster
in concourse/concourse)Reviewer Checklist