Closed cibinsb closed 4 years ago
Right. But this fix is not just a fix, but it changes previous logic. The test even still says in one of the tests:
Checking if the job status is 'running' when the pod failed to start with a reason other than ImagePullBackOff.
and then, you removed the condition on imagePullBackOff and changed the test result to Error.
If we need a timeout on other reasons than specified, it should rather be a separate value. It makes sense to fail fast on imagePullError (likely it will be an error in the request), but not so fast on waiting in a queue for resources.
Reverted to the previous logic, please reiew.
Initially I assumed that
pod.status.start_time
value will be present when polling the status of the pod. However, there were cases when its value will beNone
(to reproduce the error, delete the storage class in k8). To prevent this bug, additional check was neccesary. Now condition is added and the generic error message will be logged.