gardener / autoscaler

Customised fork of cluster-autoscaler to support machine-controller-manager
Apache License 2.0
16 stars 25 forks source link

Support extended resources and ephemeral-storage for scale-from-zero specified in MachineClass NodeTemplate #334

Closed elankath closed 5 days ago

elankath commented 2 weeks ago

What this PR does / why we need it:

Right now in scale-from-zero cases, the autoscaler does not respect ephemeral-storage and extended resource specified in the MachineClass.NodeTemplate . Only the standard cpu, gpu and memory are picked up. Neither is there any support for custom extended resource which are fully ignored presently.

Which issue(s) this PR fixes: Fixes #132

Special notes for your reviewer:

Release note:

Support extended resources and ephemeral-storage for scale-from-zero specified in MachineClass NodeTemplate
elankath commented 1 week ago

Corrected issues. Will test scale from zero for ephemeral storage and custom resources and add manual test-log tomorrow. I am unsure how to code an integration test for this though.

elankath commented 6 days ago

Test log for scale from zero with custom resource named example.com/dongle


apiVersion: v1
kind: Pod
metadata:
  name: testexres1
spec:
  containers:
    - name: example-container
      image: busybox
      command: ["sh", "-c", "echo Using extended resources && sleep 3600"]
      resources:
        limits:
          resource.com/dongle: 2
        requests:
          resource.com/dongle: 2

Node Group b has this specified:


        providerConfig:
          apiVersion: aws.provider.extensions.gardener.cloud/v1alpha1
          kind: WorkerConfig
          nodeTemplate:
            capacity:
              cpu: 8
              ephemeral-storage: 50Gi
              gpu: 0
              hana.hc.sap.com/hcu/cpu: 20
              hana.hc.sap.com/hcu/memory: 10
              memory: 7Gi
              resource.com/dongle: 6

Scale from zero triggered.


I1119 13:37:51.654540   38814 mcm_manager.go:981] Copying extended resources map[hana.hc.sap.com/hcu/cpu:{{20 0} {<nil>} 20 DecimalSI} hana.hc.sap.com/hcu/memory:{{10 0} {<nil>} 10 DecimalSI} resource.com/dongle:{{6 0} {<nil>} 6 DecimalSI}] to template node.Status.Capacity

 I1119 13:37:41.141731   38814 klogx.go:87] Pod default/testexres1 can be moved to template-node-for-shoot--i062009--abc-b-z1-5762866181449073921-upcoming-0

 Normal   TriggeredScaleUp   37s    cluster-autoscaler  pod triggered scale-up: [{shoot--i062009--abc-b-z1 0->1 (max: 1)}
dhague commented 5 days ago

Looking forward to using this feature - thanks!