Closed Rotfuks closed 3 months ago
@hervenicol What's your opinion on this ? Since all loki components have an assigned hpa, does it make sense to be alerted when the resources assigned are not actually used in the same way we do it for mimir
? I mean the hpas should scale down the components at some point right (after having flushed all the data to the object-storage) ?
HPAs will scale down until minpods
.
Then, for instance with loki-write
pods RAM, the minimal size is 2 pods with 4GB requests and 8GB limit.
On some installations, that may still be more than needed.
So, maybe we want an alert that tells us when those pods are under used.
The part I'm not confident with is how to know when we should revert these changes?
Because, say we reduced requests/limits to 1GB/2GB.
Then, the installation grows, and HPA adds new pods: that's expected.
But we need to review the requests/limits at some point.
If we do it when usage is over 90% (ie HPA's scale up threshold), it means it will happen when HPA is maxed out at maxpods
(25 for lok-write as for what I can see on golem
:astonished: ). Or can we have a better alert condition?
I'm not very comfortable with this issue as HPAs are supposed to do the job. This issue tries to improve a case where we have small installations that don't use all of Loki's reserved resources. Maybe we should start with checking the current situation: do we have installations where that's the case, and how critical this is?
It could be interesting to discuss about it during refinement :)
Both alerts and corresponding ops recipe are created and released.
Motivation
We already have VPA for Mimir and Loki but with HPA it's a bit tricky. We can manually rightsize Mimir and Loki but we need an alert to know when we can do it.
Todo
For both Mimir and Loki
Outcome