FoundationDB / fdb-kubernetes-operator

A kubernetes operator for FoundationDB
Apache License 2.0
246 stars 82 forks source link

Pods being oomkilled #2163

Open happysalada opened 2 days ago

happysalada commented 2 days ago

What happened?

Pods keep on getting oomkilled

What did you expect to happen?

Pod don't get oomkilled

How can we reproduce it (as minimally and precisely as possible)?

I've deployed the minimal example giving each pod 8Gb of memory. Throwing any load at the cluster, you start to get pod oomkilled

Anything else we need to know?

Is there a setting that I'm missing perhaps ? The memory setting doesn't seem to be respected and the pods keep on getting killed. I'm not sure what my setup is missing

FDB Kubernetes operator

```console $ kubectl fdb version # 1.48.0 ```

Kubernetes version

```console $ kubectl version # 1.24.1 ```

Cloud provider

On prem cluster
johscheuer commented 1 day ago

Could you provide some more details? Which pods are OOM killed? Could you provide an example of the used memory for the processes? Are the processes OOM killed by Linux or is the OOM triggered by FDB itself? Can you share your FoundationDBCluster spec?

happysalada commented 7 hours ago

The oomkilled pds are the storage ones as far as i can tell. Named NAME-storage-NNNN The oomkiller seems to be triggered from kube after the pods overstep their memory allocation. The way i track that is with the kube metrics container_oom_events_total from kubernetes. Here is what i customized from the operator

processCounts: Stateless: 10

For foundationdb im requesting 1 cpu and 4gb of memory.

Im using 7.3.43

And i use useDNSInClusterFile but i dont think it should matter.

I remember reading that 4 gb was enough, but i guess this is the problem then ? Maybe in the config i should set the max memory ? Or just increase every pod to 8gb

I run a cluster on baremetal with default settings and the processes are known to go up to 12gb sometimes . since the baremetal has way more memory it doesnt cause any problems.