armadaproject / armada

A multi-cluster batch queuing system for high-throughput workloads on Kubernetes.
https://armadaproject.io
Apache License 2.0
480 stars 135 forks source link

Make it easier to figure out why Armada jobs are stuck in the queue #703

Closed robertdavidsmith closed 1 year ago

robertdavidsmith commented 3 years ago

On our HTCondor farm, it's easy to tell why jobs aren't being scheduled with the condor_q -better-analyze command.

On Armada this isn't observable.

Make this easy to find out. It's a very common oncall question with HTCondor. It's not immediately clear how to design this, we should think carefully first rather than blindly copying HTCondor.

┆Issue is synchronized with this Jira Task by Unito

Sharpz7 commented 1 year ago

Closing as no longer needed.