gardener / gardener-extension-os-suse-chost

Gardener extension controller for the SUSE Container Host operating system (CHost).
https://gardener.cloud
Apache License 2.0
5 stars 29 forks source link

Hugepages of nodes must be configurable in order for pods to use them #64

Open MartinKolbAtWork opened 2 years ago

MartinKolbAtWork commented 2 years ago

In Kubernetes, pods can request and make use of memory in hugepages since a long time (Beta since K8S 1.10, GA since 1.14)

apiVersion: v1
kind: Pod
metadata:
  name: huge-pages-example
spec:
  containers:
  - name: example
    image: fedora:latest
    resources:
      limits:
        hugepages-2Mi: 100Mi
        hugepages-1Gi: 2Gi

Gardener is currently lacking the capability to configure the hugepages on operating system level. Ideally, the configuration should be possible on Gardener in a uniform way across all operating systems.

Other K8S offerings, for example Amazon EKS, offer possibilities to call OS commands before the kubelet of a node is started (see preBootstrapCommands below).

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: test-hugepages
  region: eu-central-1
managedNodeGroups:
  - name: my-nodes-group
    instanceType: c5d.large
    desiredCapacity: 1
    minSize: 1
    maxSize: 2  
    volumeSize: 50
    volumeType: gp2
    preBootstrapCommands:
      - sudo echo 64 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages

Such a configuration would enable hugepage configuration in a completely generic way.

As an alternative, a configuration that is handled by the OS-specific controllers of Gardener could offer the same functionality in a more controlled way.

See also the same issue for GardenLinux: https://github.com/gardener/gardener-extension-os-gardenlinux/issues/58

Garfield96 commented 11 months ago

In addition to the configuration of hugepages, it would be also interesting to adapt cpupower settings and to configure kernel samepage merging (KSM) (/sys/kernel/mm/ksm/run). If changing the cpupower settings fails, it should not prevent the node from getting ready. Some applications benefit from using a clock source which is different from the OS default. Therefore, being able to set /sys/devices/system/clocksource/clocksource0/current_clocksource to an own value is desirable.