gardener / machine-controller-manager

Declarative way of managing machines for Kubernetes cluster
Apache License 2.0
253 stars 116 forks source link

Label gpu nodes #727

Open himanshu-kun opened 2 years ago

himanshu-kun commented 2 years ago

How to categorize this issue?

/area auto-scaling /kind enhancement /priority 3

What would you like to be added: Label the nodes with gpu with the autoscaler label "worker.gardener.cloud/accelerator"

Why is this needed: Several reasons:

gardener-robot commented 2 years ago

@himanshu-kun Label area/auotscaling does not exist.

himanshu-kun commented 2 years ago

The labels can be added by looking at the machineClass.nodeTemplate.capacity.gpu field. The only problem is that currently mcm doesn't reconcile machine immediately on an event on machineClass which needs to be done as also stated in the issue https://github.com/gardener/machine-controller-manager/issues/517

himanshu-kun commented 1 year ago

The autoscaler could segregate gpu nodes based on the GPU label which the implementation defines(Autoscaler could get to know abt it through the interface method GPULabel(). It then calculates only gpu utilization for the gpu nodes and has a different threshold defined for them. Refer https://github.com/gardener/autoscaler/blob/dacb105216e2fe6d49e801e8f36cdaf1b8f0a7da/cluster-autoscaler/core/scale_down.go#L638-L652