Open himanshu-kun opened 2 years ago
@himanshu-kun Label area/auotscaling does not exist.
The labels can be added by looking at the machineClass.nodeTemplate.capacity.gpu
field. The only problem is that currently mcm doesn't reconcile machine immediately on an event on machineClass which needs to be done as also stated in the issue https://github.com/gardener/machine-controller-manager/issues/517
The autoscaler could segregate gpu nodes based on the GPU label which the implementation defines(Autoscaler could get to know abt it through the interface method GPULabel()
. It then calculates only gpu utilization for the gpu nodes and has a different threshold defined for them.
Refer https://github.com/gardener/autoscaler/blob/dacb105216e2fe6d49e801e8f36cdaf1b8f0a7da/cluster-autoscaler/core/scale_down.go#L638-L652
How to categorize this issue?
/area auto-scaling /kind enhancement /priority 3
What would you like to be added: Label the nodes with gpu with the autoscaler label "worker.gardener.cloud/accelerator"
Why is this needed: Several reasons:
NotReady
(even if they areReady
in reality) until the nodes broadcast theirgpu
resources (means till the gpu drivers get installed). This helps in casescaleDownUnneededTime=0
which is often done by customers in their cluster to help blue-green rollout in maintenance window. For detail on the use-case refer to https://sap-ti.slack.com/archives/C9CEBQPGE/p1700465892059819