AutoScaling Worker Nodes

beingamarnath commented 3 years ago

I'm curious to understand how Autoscaling of nodes can be achieved with Armada. I cannot find any thing on scalability in the documentation too. As far as I understand Cluster Autoscaler can work if there are pending pods (as replicas from deployment or as batch jobs). And if this can be achieved, how to ensure the cluster autoscaler doesn't remove nodes until the job in pod is completed.

┆Issue is synchronized with this Jira Task by Unito

jankaspar commented 3 years ago

Hi, this is good question.

For auto scaling to work properly in Armada, the Autoscaler would have to scale the cluster based on queued jobs rather than pending pods. Armada queues jobs outside of the k8s cluster and create pods only if there is resource available, so the Cluster Autoscaler might not see any reason to scale the cluster up. We will be looking into this in the future.

To answer your second question, I think you could avoid Autoscaler scaling down nodes with finishing jobs by specifying restrictive pod disruption budget.

fellhorn commented 3 years ago

@beingamarnath You might look for this job annotation:

cluster-autoscaler.kubernetes.io/safe-to-evict: 'false'

to tell the autoscaler not to evict your pod.

We are also interested in using armada in a setup with node autoscaling. Is there already a way to extract queue information from armada and feed it into a monitoring system? In GCP there might be already a way then to scale the instance group based on this metric [1]. It could be a bit hacky but might work.

[1] https://cloud.google.com/architecture/autoscaling-instance-group-with-custom-stackdrivers-metric

jankaspar commented 3 years ago

Hi @fellhorn, we lack more detailed documentation on this, but Armada exports various prometheus metrics you could use, for example armada_queue_size or armada_queue_resource_queued (https://github.com/G-Research/armada/blob/master/docs/production-install.md#metrics, https://github.com/G-Research/armada/blob/master/internal/armada/metrics/metrics.go#L46-L198)

max-sixty commented 1 year ago

We're considering Armada, but something like this would be important for us, as we're heavy users of GKE autoscaling.

Has anyone found a reasonable workaround? Either using something more sophisticated, such as custom metrics, or something naive?

For example — in the naive category — adding a stub workload to the cluster which Armada doesn't know about — so Armada schedules new work onto the cluster, which then causes GKE to scale up the cluster?

dejanzele commented 1 year ago

Hi @max-sixty,

Thanks for reaching out and considering using Armada! It's true that, as of now, Armada doesn't inherently support cluster autoscalers (either kubernetes-native or vendor scalers) due to the executor's assumption of a fixed resource pool and as such does not allow overscheduling of workloads.

We're definitely open to collaborating on this topic to enhance the product. We'd be happy to assist you in creating a design document and could also provide some degree of assistance during the implementation phase of this feature in the executor component.

Armada Server & Executor already exposes a couple of useful metrics around queued jobs and available resources. Part of the solution would be based on Armada Server knowing about the existence of an autoscaler and that it is perfectly fine to overschedule to some degree, which probably can be configured with adding support for overscheduling conditions. By allowing overscheduling, Pods will be submited to the executor clusters and go into Pending state, and that should kick off an Autoscaler trigger to request a new node for the cluster.

In the meantime, we would also appreciate any feedback or experiences you could share about your attempts at this workaround or any other approaches you might have tried. It's through the feedback and ideas from users like you that we're able to improve Armada to suit a wider variety of use cases.

max-sixty commented 1 year ago

Thanks for the response @dejanzele

We're right at the beginning of our exploration, so we'll take some time to evaluate options. I appreciate the openness; it would be great to collaborate.

armadaproject / armada

AutoScaling Worker Nodes #533