apache / couchdb-helm

Apache CouchDB Helm Chart
https://couchdb.apache.org/
Apache License 2.0
49 stars 64 forks source link

Too much memory consumption of couchdb pods when running on distributions with cgroup v2. #161

Open cax21 opened 5 months ago

cax21 commented 5 months ago

Describe the bug CouchDB pods will not start (due mainly to OOM killed). If we use no resource limits, we can observe the couchdb pods are using peak memory of almost 3.5GB. Then it stabilizes to 1.6GB , which is way too much in any case. Note that this is not observed at all on nodes running on cgroup v1, so we suspect cgroup v2 being the root cause of this resource issue.

Version of Helm and Kubernetes: Kubernetes: 1.27.6 (rancher) Helm : 3.15 Nodes running on : Rocky Linux 9.3 Linux Kernel : using cgroup v2 (see https://kubernetes.io/docs/concepts/architecture/cgroups/)

What happened: At startup couchdb pods are using too much memory (almost 4GB) and get killed if the resources are not given to support such a high load.

What you expected to happen: At startup, couchdb pods should consume only few MB of memory as they would do on nodes running with cgroup v1

How to reproduce it (as minimally and precisely as possible): You can easily reproduce by deploying latest chart 4.5.6 on such cluster described above. You will notice the 3 couchdb pods taking more than 3GB of memory.

Anything else we need to know: Well, may be we could expect some specific tuning in the charts for handling cgroup v2 nodes, if possible. We haven't found anything in the documentation about that.

willholley commented 5 months ago

I'm not aware of any specific tuning for cgroups v2 - you are likely the first to experiment with this. If you have proposals for configuration changes that would be useful in the helm chart, please feel free to submit a PR.

cax21 commented 5 months ago

It seems to be an issue in k8s itself I think (the way it manages memory using cgroups v2) I've tried to launch on k8s v1.30 equivalent (in fact a rancher k3s) running on rocky linux 9.4 nodes, and I see no problem at all. All pods start correctly and consume less the 256MB. I don't think there is anything wrong in the chart itself.