Closed markvincze closed 5 years ago
I realized now that this will also make the pod-based scaling not happen if we have an active canary deployment (because that's also in an additional ReplicationSet).
But I think that's not a huge issue.
Shouldn't we just change getMinPodCountBasedOnCurrentPodCount
to calculate the value not based on current pod count but current min value? And only apply the ratio to limit speed of shrinking if the current desired min value is smaller than than the one stored last in the hpa?
I'm not sure how that would work.
What I don't see is then when would the minCount ever increase?
Let's say at night the traffic goes down, and we have 10 pods, and the minCount is 9.
During the day the traffic slowly increases, and by peak time we have 20 pods.
But if we calculated the new minCount based on the current minCount, then it'd just stay 9, right? So if the traffic suddenly drops, then nothing would prevent the HPA from scaling down to 9 pods in one go.
Sorry if I miss something.
The controller sets minReplicas
based on a Prometheus query; this query shouldn't result in a higher value during a deployment if you ask me, right?
And if the desired value of minReplicas
(coming out of the query) is smaller then the current value we should use the value of the hpa-scaler-scale-down-max-ratio
annotation (if set) to slow down the decrease of minReplicas
.
I guess it currently uses the actual number of pods instead of the current value of minReplicas
to make it independent of the interval of the controller?
An example with increasing traffic
Current minReplicas: 25 Desired minReplicas: 30
We can immediately set minReplicas to 30.
An example with decreasing traffic
Current minReplicas: 25 Desired minReplicas: 10 hpa-scaler-scale-down-max-ratio: 0.9
Since the desired minReplicas is smaller than current minReplicas we take the scale down ratio into account:
We set minReplicas to 0.9 * 25 = 22,5
Does this make sense? Or do I miss something important?
Ah, clear, now I see what you mean.
The thing is that I'd like to make it possible to use the hpa-scaler-scale-down-max-ratio
independently (and even without) the Prometheus-based scaling, because making the scaledown gradual can be useful even (or especially?) in situations when we can't use the Prometheus-based scaling.
If Prometheus is unavailable we probably shouldn't modify anything, right?
The way I intend hpa-scaler-scale-down-max-ratio
to be used (and that's how we use it now), is that the deployment is scaled with an HPA based on the CPU-load, and then we constantly adjust the minPods
based on the hpa-scaler-scale-down-max-ratio
and the current pod count. Thereby we prevent sudden downscaling.
So this doesn't need a Prometheus query at all, but it's prone to the issue described in #5
Ah, I didn't understand you want to be able to run it without Prometheus because I originally created it just for that. But it make sense to slow down the shrinking of deployments even without using Prometheus queries. Perhaps you can check to see if a Prometheus query is set in the annotation. If it isn't limit scale down speed based on the actual pods, otherwise base it on the current minReplicas in the hpa.
It's just that figuring out if a deployment is running with this brute force approach seems prone to error as we grow the number of deployments. Since the Kubernetes API doesn't do paging it might take tool long to respond or use up too much memory. If that makes sense.
I looked for alternatives to figure out if a deployment was running, but couldn't find anything useful.
If it isn't limit scale down speed based on the actual pods
I think that's what we're doing now, and that's causing the issue.
Since the Kubernetes API doesn't do paging it might take tool long to respond or use up too much memory. If that makes sense.
I tested this now: in our prod cluster with 2878 ReplicaSets it took between 4-6 seconds to retrieve them (although this is a client running on my machine and using kubectl proxy
, so with an in-cluster client it might be faster), and the memory usage increases by 46MBytes when the ReplicaSets are retrieved.
So you're right, this is significantly heavier than retrieving the HPAs (that takes ~1 second and uses only 3MB memory for the 327 HPAs we have on prod)
I'm not sure what the best workaround would be at this point, I'll try to think too if I can come up with some alternative approach.
Attempt at fixing #5