Currently, ClearML queues can run on any of the K8S nodes. This resulted in 4GPU queues (pods) unable to start job on any nodes if the nodes have at least one GPU used.
As @jax79sg proposed, need to configure ClearML queues affinity to specific worker nodes.
Discuss configuration for affinity of pods to nodes.
Label K8S worker nodes.
Configure nodeSelector/node affinity and redeploy ClearML glues.
Currently, ClearML queues can run on any of the K8S nodes. This resulted in 4GPU queues (pods) unable to start job on any nodes if the nodes have at least one GPU used.
As @jax79sg proposed, need to configure ClearML queues affinity to specific worker nodes.
Configuration: https://docs.google.com/spreadsheets/d/1DESbljncKSuIzZ0osirxbLh3tcXklWcuoHpWG2Ly00Q/edit?usp=sharing