Autoscale observed value being doubled during scale in evaluation

bryandx commented 11 months ago

Describe the bug I've created autoscaling for an application and defined rules (attached file named azure-autoscale-config.txt which has the azure cli commands run to create the autoscale profiles and rules) to scale in and out using PodCpuUsage and PodMemoryUsage metrics. When the current instance count is 2, the auto scale rules will not scale in the instance count to 1 because it is doubling the observed value used in the auto scale rule evaluation. I've attached the AutoscaleEvaluationsLog query output as an Excel file (named instashare-poc-autoscalelogs.xlsx) to show the details. If you look at correlationId 4292e174-c00c-4595-88c3-59c2213d8d0d you'll see I highlighted a few rows. You can see for the PodMemoryUsage observed value for the scale out rule is 52.79545455. However, the same metric values used for the scale in rule is 105.5909091 and for that evaluation, the Projection column has a value of 2. The scale in Projection column evaluation was blank. I'm not sure where that Projection column value is coming from and why it's different for the scale in and out evaluations. Perhaps the Projection column has nothing to do with the doubled observed value but it stood out to me as a possibility. The same thing is also occurring for the PodCpuUsage rule evaluations.

To Reproduce Steps to reproduce the behavior:

Create a Spring App with autoscaling using a similar configuration as I've provided
Get 2 instances of the application running
Have the application memory and CPU usage low enough that the application scale in rules should be applied
Observe the application will not scale in

Expected behavior When the scale in rules are met, the application scale in to reduce the instance count by 1 should occur

Can we contact you for additional details? Y/N Y

azure-autoscale-config.txt instashare-poc-autoscalelogs.xlsx

Sneezry commented 11 months ago

Hi @bryandx , thanks for the feedback. I have already contacted the autoscale team, and once I receive a reply, I will update it here.

Sneezry commented 11 months ago

I have already obtained key information from the autoscale and metric teams, but there are still some details that need to be further confirmed with them. Once I have all the necessary information, I will update it here as soon as possible. Thank you for your understanding and patience.

Sneezry commented 11 months ago

Hi @bryandx , I have got some information from Autoscale team and metric team.

The phenomenon you are seeing is as expected, and the metric data is correct.

Autoscale will first verify if executing a scale in action would immediately trigger another scale out rule, causing oscillation.

In your scenario, since there are currently 2 instances, if scaling in to 1 instance can be anticipated, the previous metric will double (52.7 * 2 = 105.5), meaning multiplying the observed value by 2.

Also, because this value will indeed trigger another scale out rule (--scale out 1 --cooldown 5 --condition "PodMemoryUsage > 80), in order to compare the oscillation, autoscale decides not to take any action (skip --scale in 1 --cooldown 5 --condition "PodMemoryUsage < 65 avg even if the metric value is 52.7).

For the projection: this is more like a scaling factor like for example in this case OldinstanceCount / NewInstanceCount.

Please feel free to ask if you have any further questions.

cc @zhiszhan

bryandx commented 11 months ago

@Sneezry Unfortunately that logic/calculation for scale in seems flawed. As can be seen by the attached screenshot, the average memory use for this application over the last 30 days is pretty constant and around 60% per instance. So no flapping would occur if the app is scaled in to 1 instance because the memory utilization would be under 65%. The memory usage will not just jump to 105% of the single instance during a scale in - it will still be around the 60% utilization like it consistently has been.

If that is how observed value is calculated for scale in, then wouldn't the same logic be used for scale out and prevent a scale out from occurring? For example, if a single instance of the application is using 82% of allocated memory, then shouldn't the observed value be 41% for scale out (since after scale out there are 2 instances running and 82 / 2 = 41) and therefore prevent a scale out from occurring? I'm glad the calculation value isn't working that way for scale out but I wanted to explain that calculation appears to be different for scale out.

So, if this isn't going to get fixed, how does anyone utilize metrics for autoscaling Spring Apps successfully? For this application example, what should we use for a PodMemoryUsage metric value to allow scale in to occur being that the application typically uses around 60% of memory? I don't want to create a scale in rule of PodMemoryUsage < 130 avg do I? That's the only way I can think of getting it to work based on the current logic of how observed value is being done for scale in and that doesn't make sense if the scale out rule of PodMemoryUsage > 80 avg is correct.

For this application's autoscale rules, I got the idea from this autoscaling best practices page https://learn.microsoft.com/en-us/azure/azure-monitor/autoscale/autoscale-best-practices#considerations-for-scaling-when-multiple-rules-are-configured-in-a-profile and then modified our thresholds based upon actual application metrics.

cc @zhiszhan

instashare-poc-memoryusage

Sneezry commented 10 months ago

Hi @bryandx, it seems like the expected behavior by the autoscale design. If you have any questions about autoscaling, I recommend creating a new ticket on the Azure Portal to Azure Autoscale service for further support. Let me know if there's anything else I can assist you with!

bowen5 commented 9 months ago

Hi @bryandx

Such kind of behavior is by designed. Let me describe more details based on your case:

You have rules configured to be :
1. scale in 1 instance when average memory usage < 65%
2. scale out 1 instance when average memory usage > 80%
In your case, when you have 2 instances whose average memory usage is 52.795%, the scaling logic will
1. Evaluate rule 1.1, find that it meets the scaling criteria, then, it assumes the average memory usage would become 105.59% if scale in happens
  - That’s why you are seeing a double metrics value in the logs. The logic behind is: X amount of workloads evenly distribute to 2 instances and consume each one 52.795% memory, then after scale in, the same X amount of workloads would consume a single instance 105.59% memory, since the workloads is fixed.
  - The logic can be formulated to be “A M = (A-1) N”
    - A: Instance number before scale in
    - M: Average memory usage (or other metric) before scale in
    - A – 1: Instance number after scale in by 1
    - N: Average memory usage (or other metric) after scale in by 1 Since in your case A=2, then result in 2M=N which is a doubled value.
  - Of course, the formular is not perfect in real scenario, since the formular is based on the assumption that “With a given workload, metric value has linear relationship with instance number”, while it is a more balanced and self-contained logic.
2. Then after above calculation, scaling logic then take the “105.59% if scale in happens” to evaluate rule 1.2, then it found that it need to scale out again. Due to this, to avoid flapping (scale in -> scale out -> scale in -> scale out -> …), it just skip the scale in. For more details, please refer to Autoscale flapping - Azure Monitor | Microsoft Learn.
To resolve the issue and make the scaling work as expected, do you mind to further tune your scaling configuration based on above logic? For example, for the case of “scale between 1 and 2 instances based on memory usage”, how about updated to be
1. scale in 1 instance when average memory usage < 40%
2. scale out 1 instance when average memory usage > 80%

Please note that above sample is just for [1,2] instances, for [2,3], [X,Y] cases, please further tune based on above logic.

bowen5 commented 9 months ago

@Sneezry Unfortunately that logic/calculation for scale in seems flawed. As can be seen by the attached screenshot, the average memory use for this application over the last 30 days is pretty constant and around 60% per instance. So no flapping would occur if the app is scaled in to 1 instance because the memory utilization would be under 65%. The memory usage will not just jump to 105% of the single instance during a scale in - it will still be around the 60% utilization like it consistently has been.

If that is how observed value is calculated for scale in, then wouldn't the same logic be used for scale out and prevent a scale out from occurring? For example, if a single instance of the application is using 82% of allocated memory, then shouldn't the observed value be 41% for scale out (since after scale out there are 2 instances running and 82 / 2 = 41) and therefore prevent a scale out from occurring? I'm glad the calculation value isn't working that way for scale out but I wanted to explain that calculation appears to be different for scale out.

So, if this isn't going to get fixed, how does anyone utilize metrics for autoscaling Spring Apps successfully? For this application example, what should we use for a PodMemoryUsage metric value to allow scale in to occur being that the application typically uses around 60% of memory? I don't want to create a scale in rule of PodMemoryUsage < 130 avg do I? That's the only way I can think of getting it to work based on the current logic of how observed value is being done for scale in and that doesn't make sense if the scale out rule of PodMemoryUsage > 80 avg is correct.

For this application's autoscale rules, I got the idea from this autoscaling best practices page https://learn.microsoft.com/en-us/azure/azure-monitor/autoscale/autoscale-best-practices#considerations-for-scaling-when-multiple-rules-are-configured-in-a-profile and then modified our thresholds based upon actual application metrics.

cc @zhiszhan

I’m afraid that’s by designed that scale in and scale out are not sharing the same “avoid flapping” logic, refer to Autoscale flapping - Azure Monitor | Microsoft Learn

To ensure adequate resources, checking for potential flapping doesn't occur for scale-out events. Autoscale will only defer a scale-in event to avoid flapping.

Based on the design choice of “ensuring customers have sufficient resources” has higher priority than “identical algorithm between scale in and scale out”, which result in that overall logic would tends to more instance counts.

Azure / Azure-Spring-Apps

Autoscale observed value being doubled during scale in evaluation #50