carlosedp / cluster-monitoring

Cluster monitoring stack for clusters based on Prometheus Operator
MIT License
740 stars 200 forks source link

Kubernetes Resource Overcomitting #84

Closed Frettarix closed 4 years ago

Frettarix commented 4 years ago

Hi,

I've noticed that the 'prometheus-k8s-0' pod has a request AND limit for the same level: cpu: 100m memory: 25Mi

But it uses more: image

By changing this are we able to prevent the overcomitting error?

carlosedp commented 4 years ago

This is set as default by the upstream libraries. It could be overridden in base_operator_stack.jsonnet.

carlosedp commented 4 years ago

Could be something like:

diff --git a/base_operator_stack.jsonnet b/base_operator_stack.jsonnet
index e8fca74..8a37783 100644
--- a/base_operator_stack.jsonnet
+++ b/base_operator_stack.jsonnet
@@ -69,11 +69,21 @@ local vars = import 'vars.jsonnet';
   prometheus+:: {
     // Add option (from vars.yaml) to enable persistence
     local pvc = k.core.v1.persistentVolumeClaim,
+
     prometheus+: {
+      local statefulSet = k.apps.v1.statefulSet,
+      local container = statefulSet.mixin.spec.template.spec.containersType,
+      local resourceRequirements = container.mixin.resourcesType,
+      local resources =
+        resourceRequirements.new() +
+        resourceRequirements.withRequests({ cpu: '200m', memory: '200Mi' }) +
+        resourceRequirements.withLimits({ cpu: '400m', memory: '400Mi' }),
+
       spec+: {
                // Here one can use parameters from https://coreos.com/operators/prometheus/docs/latest/api.html#prometheusspec
                replicas: $._config.prometheus.replicas,
                retention: vars.prometheus.retention,
+               resources: resources,
                scrapeInterval: vars.prometheus.scrapeInterval,
                scrapeTimeout: vars.prometheus.scrapeTimeout,
                externalUrl: 'http://' + $._config.urls.prom_ingress,

And rebuilding the manifests (with make).

carlosedp commented 4 years ago

Customization provided.