giantswarm / roadmap

Giant Swarm Product Roadmap
https://github.com/orgs/giantswarm/projects/273
Apache License 2.0
3 stars 0 forks source link

`mimir-querier` needs a lot of resources #3687

Open QuantumEnigmaa opened 2 weeks ago

QuantumEnigmaa commented 2 weeks ago

Lately, we've been continuously paged by the MimirHPAReachedMaxReplicas alert, mostly for the querier component. Currently this component has a maxReplicas number of 10 with its hpa and a default resources requests of 100m for CPU and 500Mi for the memory.

According to the following per-pod resources usage, CPU is fine but memory could use a little increase : Image

From our current understanding, the querier needs to scale whenever someone uses its installation's grafana, which would mean that it should scale down during the night. Because of this, I would advocate to increase the maxReplicas to 15 and slightly increase its default memory requests in the shared-configs repo (i;e something like 650~700Mi) to avoid having to increase the former forever.

hervenicol commented 2 weeks ago

Looks like the RAM usage is quite stable over time, as you can see over 10 days here: Image

I've increased its RAM reservations to 1GB on a few installations:

Let's see in a while if it stabilises at 512MB or if it grows up to 1GB.

We could also do a bit of profiling (with pprof) to try to understand what uses this RAM.

QuentinBisson commented 1 day ago

Increased to 1GB on enigma as well