The Kubernetes job controller will only set the completion time on a Job
when the job's succeeded count is >= the Job's .spec.completions field
if the field is set, or 1 otherwise. The controller will retry the job
unless the retries have exceeded .spec.backoffLimit.
Therefore a Job can have a failed count of > 0 and a succeeded count of > 0
while still counting as successful, contrary to the previous
implementation of the object_stats check.
This commit removes the condition of requiring a failed count == 0 to
treat a Job execution as successful, as the presence of a completion
timestamp in conjunction with no active jobs and > 0 completed jobs is
sufficient to determine that a Job completed successfully, see the Job
controller implementation [1].
Technically, the check could simply be the presence of the completion timestamp, as that field is only set when the Job controller determines that the Job has completed successfully, cf. the linked code.
The Kubernetes job controller will only set the completion time on a Job when the job's succeeded count is >= the Job's
.spec.completions
field if the field is set, or 1 otherwise. The controller will retry the job unless the retries have exceeded.spec.backoffLimit
.Therefore a Job can have a failed count of > 0 and a succeeded count of > 0 while still counting as successful, contrary to the previous implementation of the object_stats check.
This commit removes the condition of requiring a failed count == 0 to treat a Job execution as successful, as the presence of a completion timestamp in conjunction with no active jobs and > 0 completed jobs is sufficient to determine that a Job completed successfully, see the Job controller implementation [1].
[1] https://github.com/kubernetes/kubernetes/blob/8211cabfb2bf3b2b531b13589843130cb47df1b1/pkg/controller/job/job_controller.go#L518-L570