elixir-cloud-aai / TESK

GA4GH Task Execution Service Root Project + Deployment scripts on Kubernetes
https://tesk.readthedocs.io
Apache License 2.0
40 stars 29 forks source link

How to clean up failed jobs and pods #109

Open lvarin opened 4 years ago

lvarin commented 4 years ago

Hello, We have seen an issue in TESK, we put the wrong tag for an image in a workflow, and now the pod that has to execute is in "ImagePullBackOff". What is the best way to solve this?

We are a bit stuck due to this. In general the problem is that when the task fails, but another user created it, we can only delete the task directly using kubectl/oc . I miss a way to see the status of the cluster from an admin perspective, in a way that I can clean up tasks fast, and leave the cluster ready for more tests. Now I lose a lot of time checking jobs and pods.

Regards

aniewielska commented 4 years ago

The particular problem of "ImagePullBackOff" should have been solved already here https://github.com/EMBL-EBI-TSI/tesk-core/releases/tag/v8.1.0 But generally, wow would you solve "seeing the status of the cluster from an admin perspective"? Admin API in TESK to see all tasks (that should be already possible) and to delete tasks from outside of K8s (that is not covered by TES, but a good idea)?