crc-org / snc

Single Node Cluster creation scripts for OpenShift 4.x as used by CodeReady Containers
https://crc.dev
Apache License 2.0
100 stars 49 forks source link

Spike: Investigate the use of swap for OCP-4.15 to deal with default memory requirements #861

Open praveenkumar opened 4 months ago

praveenkumar commented 4 months ago

As per https://docs.openshift.com/container-platform/4.15/nodes/nodes/nodes-nodes-managing.html#nodes-nodes-swap-memory_nodes-nodes-managing it is possible to use swap but this is in Tech preview. I was trying it out and see how reliable we can start the cluster without increase the resources on crc side. Because for 4.15, OVN-K is default and require more memory resource (~1.5G) then SDN for network operator.

  1. Enable Tech- preview feature: Can be done using install-config or as day-2 https://docs.openshift.com/container-platform/4.15/nodes/clusters/nodes-cluster-enabling-features.html#nodes-cluster-enabling-features-cli_nodes-cluster-enabling
  2. Add kernel arg swapaccount=1 which can be done with following machine config
    apiVersion: machineconfiguration.openshift.io/v1
    kind: MachineConfig
    metadata:
    labels:
    machineconfiguration.openshift.io/role: master
    name: 05-kernelarg-swapon
    spec:
    kernelArguments:
    - swapaccount=1
  3. Have a custom kubelet setting to enable swap
    oc label machineconfigpool master kubelet-swap=enabled
    apiVersion: machineconfiguration.openshift.io/v1
    kind: KubeletConfig
    metadata:
    name: swap-config
    spec:
    machineConfigPoolSelector:
    matchLabels:
      kubelet-swap: enabled
    kubeletConfig:
    failSwapOn: false 
    memorySwap:
      swapBehavior: UnlimitedSwap
  4. Have a swap partition in the VM

After all those steps swap is used for the the workload and take care of all the extra mem requirement but it has some caveats which is part of https://kubernetes.io/blog/2023/08/24/swap-linux-beta/ one. On openshift side since we enable Techpreview feature gate which means anything behind this gate is enabled automatic which are lot of things mentioned in the doc.

Node resources when swap is on (you can see memory is over committed because of swap is taking hit) and I started this cluster with default mem setting (9G)

Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests       Limits
  --------           --------       ------
  cpu                3245m (85%)    0 (0%)
  memory             9967Mi (117%)  0 (0%)
  ephemeral-storage  0 (0%)         0 (0%)
  hugepages-1Gi      0 (0%)         0 (0%)
  hugepages-2Mi      0 (0%)         0 (0%)
<crc_vm>$ $ sudo swapon
NAME              TYPE SIZE   USED PRIO
/var/vm/swapfile1 file 7.9G 360.6M   -2

Should we go with this option and not update the resource limit on crc side or should we not use it because it is tech preview?

All this is done as day-2 operation on our existing 4.15 bundle so I am not sure how much bundle size increase if we do it.

cfergeau commented 3 months ago

Should we go with this option and not update the resource limit on crc side or should we not use it because it is tech preview?

In general swap is no magic bullet, it helps to overcommit, but the price to pay is slower performance. The more you overcommit, the slower your system will get. What is the impact here?

praveenkumar commented 3 months ago

In general swap is no magic bullet, it helps to overcommit, but the price to pay is slower performance. The more you overcommit, the slower your system will get. What is the impact here?

@cfergeau impact in case of cluster performance? Because I didn't see but I also didn't put any workload. Docs on kubenetes already suggest the https://kubernetes.io/blog/2023/08/24/swap-linux-beta/#caveats those.

cfergeau commented 3 months ago

Is there an impact on cluster startup time?

gbraad commented 3 months ago

I am less concerned about the startup time, as the introduction of swap to prevent the increase of the default memory might have effects on the overall use.


As such, we do not advocate the utilization of swap memory for workloads or environments that are subject to performance constraints. Furthermore, it is recommended to employ LimitedSwap, as this significantly mitigates the risks posed to the node.

'performance constraints' might already be the case to get the cluster in a stable state (startup time). Though I want to see an actual and representative payload to test this.

praveenkumar commented 3 months ago

Is there an impact on cluster startup time?

During my testing I didn't see any impact but let me create the bundle and then see.

cfergeau commented 3 months ago

Enable Tech- preview feature

What are the implications of this? This allows us to use swap, but does this also enable automatically other features we may want or not want?

praveenkumar commented 3 months ago

Enable Tech- preview feature

What are the implications of this? This allows us to use swap, but does this also enable automatically other features we may want or not want?

https://docs.openshift.com/container-platform/4.15/nodes/clusters/nodes-cluster-enabling-features.html#nodes-cluster-enabling-features-about_nodes-cluster-enabling have all details about what ll features are auto enabled (even we want or not)

cfergeau commented 3 months ago

Pod security admission enforcement. Enables the restricted enforcement mode for pod security admission. Instead of only logging a warning, pods are rejected if they violate pod security standards. (OpenShiftPodSecurityAdmission)

This one might be problematic? Though it looks like we can change back the value to be more permissive.

praveenkumar commented 3 months ago

After bit more experiment looks like swap is not stable as I thought and stop => start always fails. Filled 2 different issues around swap.