elixir-cloud-aai / tesk-core

Python code that is launched as images into the Kubernetes cluster by tesk-api.
Apache License 2.0
2 stars 13 forks source link

Bug Fix: When polling pod.status.start_time. #35

Closed cibinsb closed 4 years ago

cibinsb commented 4 years ago

Initially I assumed that pod.status.start_time value will be present when polling the status of the pod. However, there were cases when its value will be None (to reproduce the error, delete the storage class in k8). To prevent this bug, additional check was neccesary. Now condition is added and the generic error message will be logged.

aniewielska commented 4 years ago

Right. But this fix is not just a fix, but it changes previous logic. The test even still says in one of the tests:

        Checking if the job status is 'running' when the pod failed to start with a reason other than ImagePullBackOff.

and then, you removed the condition on imagePullBackOff and changed the test result to Error.

If we need a timeout on other reasons than specified, it should rather be a separate value. It makes sense to fail fast on imagePullError (likely it will be an error in the request), but not so fast on waiting in a queue for resources.

cibinsb commented 4 years ago

Reverted to the previous logic, please reiew.