InitContainer elastic-internal-init-filesystem OOMKilled

jorpilo commented 1 year ago

Bug Report

What did you do? Multiple and repetitive elasticsearch pods are being OOMkilled during initialization. This is a constant and repetitive issue that we are seeing a large number of times a day in multiple environments. This is causing slow start up of pods.

What did you expect to see? Elasticsearch pods starting as expected and not being killed.

What did you see instead? Under which circumstances? Elasticsearch pods are being OOMkilled during initialization due to initContainer elastic-internal-init-filesystem running out of memory. This is due to the cp command in prepare-fs.sh utilizing more than to 50 mi of memory limited to the initContainer. Should memory for the initContainer be increased to 100mi?

Environment

ECK version: 2.5.1
Kubernetes information:

Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.4", GitCommit:"872a965c6c6526caa949f0c6ac028ef7aff3fb78", GitTreeState:"clean", BuildDate:"2022-11-09T13:36:36Z", GoVersion:"go1.19.3", Compiler:"gc", Platform:"darwin/arm64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.901", GitCommit:"5fba0fed0cbeefb95cc12d79b4b2db53692d171b", GitTreeState:"clean", BuildDate:"2023-01-04T10:50:45Z", GoVersion:"go1.18.9", Compiler:"gc", Platform:"linux/amd64"}

thbkrkr commented 1 year ago

Hello,

This is surprising. 50Mi should normally be enough for that init container.

A bit of history, we started with 10Mi and increased it to 20Mi to fix an issue with CRI-O where 10Mi was too low. Then we increased it to 50Mi after users reported having OOMKilled container with 20Mi on Jan 21th 2020. Since then in 3 years we have never heard of this issue again.

Does the issue constantly appear? What distro of k8s are you using?

jorpilo commented 1 year ago

The issue is appearing multiple times across different environments.

We are using vanilla k8s

flikweert44 commented 1 year ago

Similar here on ECK version 2.6.1, happened after upgrading from k8s 1.24.6 to 1.25.3. Seems to be the main container get's OOMKilled, the init containers complete.

The same cluster config works fine with ECK 2.6.1 on k8s 1.24.6.

I increased the memory requests to 20Gi, but that didn't help either.


kind: Elasticsearch
metadata:
  name: tracing
spec:
  version: 7.17.9
  http:
    service:
      spec:
        clusterIP: None
  nodeSets:
  - name: default
    count: 3
    podTemplate:
      spec:
        containers:
        - name: elasticsearch
          resources:
            requests:
              memory: 20Gi
              cpu: 4
            limits:
              memory: 20Gi    
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data # Do not change this name unless you set up a volume mount for the data path.
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 16Gi
        storageClassName: default
    config:
      node.store.allow_mmap: false
      logging.level: debug    `

zeezoich commented 1 year ago

This issue and my issue https://github.com/elastic/cloud-on-k8s/issues/6542 appear to have the same root cause. The init-container elastic-internal-init-filesystem uses too much ephemeral storage because it copies the plugins within the pod. Even if the plugins have been already been installed in a custom elasticsearch image. You'll higher chances of success at reproducing this issue if you use large plugins. Feel free to close #6542 as it seems to be a duplicate of this issue.

zeezoich commented 1 year ago

Additional for anyone interested in a workaround to this issue, you'll to to introduce a persistent volume to house the plugins in instead of using the ephemeral storage. You'll need to customize the init-container elastic-internal-init-filesystem.

- name: elastic-internal-init-filesystem
        volumeMounts:
          - name: elastic-internal-elasticsearch-plugins-local
            mountPath: /mnt/elastic-internal/elasticsearch-plugins-local

You then have to add a volumeClaimTemplate named elastic-internal-elasticsearch-plugins-local

- metadata:
      name: elastic-internal-elasticsearch-plugins-local
    spec:
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 1Gi #or some reasonable value
      storageClassName: <your-storage-class-name>

my YAML formatting is not correct; but it's YAML

naemono commented 1 year ago

@jorpilo Could you please share more information with us such as what version of Elasticsearch, and as much of a full yaml manifest as possible so we can try and replicate? Is there something special about the ES container you're using?

Could you also share either the k8s Events, or the Status sub-object of Elasticsearch showing exactly which pods are getting OOM killed?

Thank you.

@zeezoich your issue appears to be different, as this is an OOM kill, and yours seems to be some ephemeral storage limit (disk related, not memory).

zeezoich commented 1 year ago

@naemono In our case the ephemeral storage is backed by RAM. That's why I suspected it's the same issue.

barkbay commented 1 year ago

Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.901"

Looks like I missed a couple of k8s releases 😄 Did you build K8S yourself?

This is due to the cp command in prepare-fs.sh utilizing more than to 50 mi of memory limited to the initContainer

@jorpilo As mentioned by @naemono we need more information. Would it be possible to also share:

[ ] the kernel messages from the k8s node if it's possible? I'm looking for the ones that start with invoked oom-killer ...
[ ] what CRI implementation are you using? containerd, cri-o ... ? What version?

zeezoich commented 1 year ago

@barkbay Just curious on why the copying of plugins is being done from one dir inside the container to another? In my case because the plugins are huge (relatively > 3GB), it causes the eviction of the pod as it exceeds the ephemeral storage limit. So I am just trying to understand this bit and if there is a flag to do away with the copying of plugins. thanks.

jorpilo commented 1 year ago

Hello. Sorry for the delay kernel messages

mamba invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=979
oom_kill_process.cold+0xb/0x10
oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=f4b8c4eca72c984ac986f74421ca27a0b85a22c6cb3e826d79a6760bd9770311,mems_allowed=0,oom_memcg=/kubepods/burstable/poda858f6f1-ac60-4a58-9ed2-bcccdd287994,task_memcg=/kubepods/burstable/poda858f6f1-ac60-4a58-9ed2-bcccdd287994/f4b8c4eca72c984ac986f74421ca27a0b85a22c6cb3e826d79a6760bd9770311,task=mamba,pid=1641300,uid=185
Memory cgroup out of memory: Killed process 1641300 (mamba) total-vm:4797068kB, anon-rss:4100028kB, file-rss:20916kB, shmem-rss:0kB, UID:185 pgtables:8228kB oom_score_adj:979
[  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
hubble invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=-997
oom_kill_process.cold+0xb/0x10
Memory cgroup out of memory: Killed process 8248 (hubble) total-vm:832316kB, anon-rss:121520kB, file-rss:25932kB, shmem-rss:0kB, UID:0 pgtables:444kB oom_score_adj:-997
oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=86624e560d6b48f099bf2782bc7298cb8dee5f7a68a88bef83775a9d5a09bd74,mems_allowed=0,oom_memcg=/kubepods/burstable/pode6b72518-41c3-4812-a8aa-1cfcb8108fa9/86624e560d6b48f099bf2782bc7298cb8dee5f7a68a88bef83775a9d5a09bd74,task_memcg=/kubepods/burstable/pode6b72518-41c3-4812-a8aa-1cfcb8108fa9/86624e560d6b48f099bf2782bc7298cb8dee5f7a68a88bef83775a9d5a09bd74,task=hubble,pid=8248,uid=0
[  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
oom_kill_process.cold+0xb/0x10

CRI implementation containerd://1.6.12

elastic / cloud-on-k8s

InitContainer elastic-internal-init-filesystem OOMKilled #6406

Bug Report