Mirantis / cri-dockerd

dockerd as a compliant Container Runtime Interface for Kubernetes
Apache License 2.0
1.01k stars 272 forks source link

BestEffort pods are using swap #343

Open robertbotez opened 3 months ago

robertbotez commented 3 months ago

What happened?

I already opened a ticket on kube repo which leaded me here.

I was testing the support for swap and I came to an unexpected behavior. In the documentation it is specified that only pods that fall under the Burstable class can use the host's swap memory. However, I created both a deployment with 1 replica of ubuntu belonging to the Burstable class, and one belonging to the BestEffort class, where I ran the command stress --vm 1 --vm-bytes 6G --vm-hang 0 to see the consumption of memory made. The host has 4GB RAM memory and 5GB swap. In both situations, the pod started using swap after exceeding the RAM memory requirement. Wasn't the BestEffort pod supposed to be restarted when it reached the limit of the host's RAM memory? I mention that the kubelet is configured to swapBehavior=LimitedSwap. I attached two pictures where you can see the normal consumption of host, and consumption after running stress command inside pod

Screenshot 2024-03-27 at 12 02 30

. screenshot_2024-03-27_at_12 37 02

What did you expect to happen?

I expected the BestEffort pod to be killed when it consumes more RAM memory than the host have available.

How can we reproduce it (as minimally and precisely as possible)?

$ $ cat test.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ubuntu-deployment
  labels:
    app: ubuntu
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ubuntu
  template:
    metadata:
      labels:
        app: ubuntu
    spec:
      containers:
      - name: ubuntu
        image: ubuntu:22.04
        resources:
        command: [ "/bin/bash", "-c", "--" ]
        args: [ "while true; do sleep 30; done;" ]

$ cat /sys/fs/cgroup/memory.swap.max 
max

Anything else we need to know?

I am using cgroup v2.

Here is my kubelet config.

```console $ cat /var/lib/kubelet/config.yaml apiVersion: kubelet.config.k8s.io/v1beta1 authentication: anonymous: enabled: false webhook: cacheTTL: 0s enabled: true x509: clientCAFile: /etc/kubernetes/pki/ca.crt authorization: mode: Webhook webhook: cacheAuthorizedTTL: 0s cacheUnauthorizedTTL: 0s cgroupDriver: systemd clusterDNS: - 10.96.0.10 clusterDomain: cluster.local containerRuntimeEndpoint: "" cpuManagerReconcilePeriod: 0s enableServer: true evictionPressureTransitionPeriod: 0s failSwapOn: false featureGates: NodeSwap: true fileCheckFrequency: 0s healthzBindAddress: 127.0.0.1 healthzPort: 10248 httpCheckFrequency: 0s imageMaximumGCAge: 0s imageMinimumGCAge: 0s kind: KubeletConfiguration logging: flushFrequency: 0 options: json: infoBufferSize: "0" verbosity: 0 memorySwap: swapBehavior: LimitedSwap nodeStatusReportFrequency: 0s nodeStatusUpdateFrequency: 0s resolvConf: /run/systemd/resolve/resolv.conf rotateCertificates: true runtimeRequestTimeout: 0s shutdownGracePeriod: 0s shutdownGracePeriodCriticalPods: 0s staticPodPath: /etc/kubernetes/manifests streamingConnectionIdleTimeout: 0s syncFrequency: 0s volumeStatsAggPeriod: 0s ```

Kubernetes version

```console $ kubectl version Client Version: v1.29.3 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: v1.29.3 ```

Cloud provider

Hetzner Cloud, but Kubernetes was deployed using `kubeadm`.

OS version

```console $ cat /etc/os-release PRETTY_NAME="Ubuntu 22.04.4 LTS" NAME="Ubuntu" VERSION_ID="22.04" VERSION="22.04.4 LTS (Jammy Jellyfish)" VERSION_CODENAME=jammy ID=ubuntu ID_LIKE=debian HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" UBUNTU_CODENAME=jammy $ uname -a Linux fs-kube-dev-1 5.15.0-100-generic #110-Ubuntu SMP Wed Feb 7 13:27:48 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux ```

Install tools

```console $ kubeadm version kubeadm version: &version.Info{Major:"1", Minor:"29", GitVersion:"v1.29.3", GitCommit:"6813625b7cd706db5bc7388921be03071e1a492d", GitTreeState:"clean", BuildDate:"2024-03-15T00:06:16Z", GoVersion:"go1.21.8", Compiler:"gc", Platform:"linux/amd64"} ```

Container runtime (CRI) and version (if applicable)

```console $ cri-dockerd --version cri-dockerd 0.3.11 (9a8a9fe) ```

Related plugins (CNI, CSI, ...) and versions (if applicable)

calico: version: 3.27.2
iholder101 commented 3 months ago

/cc

neersighted commented 3 months ago

This is because KEP 2400 was never supported, as best I can tell.

kannon92 commented 3 months ago

yea, its more of a feature request for KEP 2400. I was hoping someone in the cri-dockerd could explore implementing this?

neersighted commented 3 months ago

PRs are welcome, and a couple of the regular contributors have done other KEP enablement work and might be interested in picking this up (but also I can't speak for their interest or priorities).

afbjorklund commented 3 months ago

I think it was a known issue, support for Docker is not a requirement for adding new features.

Memory QoS in Alpha phase is designed to support containerd and cri-o.