Closed kolia closed 3 years ago
Keep in mind that the
restartPolicy
applies to the Pod, and not to the Job itself: there is no automatic Job restart once the Job status istype: Failed
. That is, the Job termination mechanisms activated with.spec.activeDeadlineSeconds
and.spec.backoffLimit
result in a permanent Job failure that requires manual intervention to resolve.
– https://kubernetes.io/docs/concepts/workloads/controllers/job/
@omus any thoughts on how to make it so that exit(42)
doesn't lead to a zombie?
backoffLimit: 0
seems to be what you want in combination with restartPolicy: Never
.
Currently julia pods that exit with non-zero status trigger a job retry, wherein the job re-spawns a pod which ends up being a zombie.
Apparently setting
restartPolicy: Never
on the pod spec is not enough.Maybe setting the job spec's
backoffLimit
will do the trick.Repro: start a
julia_pod
and in the julia repl doexit(42)
; the job will respawn a pod that nobody is attached to, i.e. a zombie.