Azure / AppConfiguration-DotnetProvider

The .NET Standard configuration provider for Azure App Configuration
https://github.com/Azure/AppConfiguration
MIT License
77 stars 34 forks source link

Remove MaxAttempts for CalculateBackoffTime #507

Closed zhiyuanliang-ms closed 6 months ago

zhiyuanliang-ms commented 6 months ago

The variable name MaxAttempts is misleading. This variable is used for preventing the case that the attempts is larger than the bit number of long int. For example, 1 << 65 will give the same result as 1 << 1. I think the name MaxSafeShiftBit or something similar will be more informative. The name MaxAttempts is quite misleading because it feels like if the attempts reach the MaxAttempts, the retry will be ended. However, in the code,

//
// IMPORTANT: This can overflow
double calculatedMilliseconds = Math.Max(1, minDuration.TotalMilliseconds) * ((long)1 << Math.Min(attempts, MaxAttempts));

if (calculatedMilliseconds > maxDuration.TotalMilliseconds ||
        calculatedMilliseconds <= 0 /*overflow*/)
{
    calculatedMilliseconds = maxDuration.TotalMilliseconds;
}

In fact, attempts will never reach MaxAttempts, because maxDuration is usually very small.

In Python Provider, the implementation is the same as dotnet provider. The max_attempts = 63 makes no sense in python provider to keep the same behavior since there is no overflow in python.

In this PR, I suggested using Math.Pow instead of << operation. I tested the performance difference, it is only about 0.01 ms. Double type will not have the overflow issue. The "overflow" will return double.PositiveInfinity which is larger than any double number. We can remove the code of handling overflow.

Also, I found the discussion about the exponential strategy which called out the possibility to adjust the factor.

Also, I feels Math.Max(1, minDuration.TotalMilliseconds) weird. I suggest adding some out range check for all TimeSpan parameters.

Eskibear commented 6 months ago

I agree with most of above points.

Btw I just glanced at the original discussion mentioned above, where a good analysis of factor was done and 1.5 was decided to be the value. Just wondering why it eventually used 2 (with bitwise operation)? Performance or any other concern?

jimmyca15 commented 6 months ago

Just wondering why it eventually used 2 (with bitwise operation)? Performance or any other concern?

Because 2 is the de-facto approach for exponential backoff. Using 1.5 instead was proposed as an attempt to increase the amount of retry attempts performed on startup. An offline discussion was had about this. Changing the exponential growth factor isn't a real solution to the problem that was trying to be solved. After discussion to pin down what was actually desired when the original suggestion was made, it revealed that the desire was to have a max time between attempts on startup. So the final design was to keep an exponential growth factor of 2 over the long term, but design certain cut off points on startup where the max time between attempts is determined.

avanigupta commented 6 months ago

In this PR, I suggested using Math.Pow instead of << operation. I tested the performance difference, it is only about 0.01 ms.

Bitwise operations are much faster than arithmetic operations because bit manipulation is optimized at the hardware level. Math.Pow is more generic and designed to handle floating point calculations better. Hence, it's not optimized for the special case that is calculating powers of 2. I think we should keep that optimization until we decide to change the exponential factor of 2.

The name MaxAttempts is quite misleading because it feels like if the attempts reach the MaxAttempts, the retry will be ended.

I agree with this, we can change the name of the variable to clarify the intended behavior.

avanigupta commented 6 months ago

The max_attempts = 63 makes no sense in python provider to keep the same behavior since there is no overflow in python.

@mrm9084 and I discussed this a while ago. Matt mentioned that 1 << 14284 can overflow in Python. Realistically, we will never reach that limit but it's still a good practice to have some upper bound for safeguarding. That's why we chose 63 to be consistent with other providers.

mrm9084 commented 6 months ago

The max_attempts = 63 makes no sense in python provider to keep the same behavior since there is no overflow in python.

@mrm9084 and I discussed this a while ago. Matt mentioned that 1 << 14284 can overflow in Python. Realistically, we will never reach that limit but it's still a good practice to have some upper bound for safeguarding. That's why we chose 63 to be consistent with other providers.

I'll still update the variable name as I agree it isn't a good one.

RichardChen820 commented 6 months ago

Honestly, I'm not a big fan of updating the variable name, the new name does not help more in intuitively understanding its purpose. I still need to understand it by reading the code of how it's used. But I'm not against this change.

zhiyuanliang-ms commented 6 months ago

@RichardChen820 The name "MaxSafeShiftBit" is more informative and indicate this variable is only for safeguarding the shift operation. When I was reading the code of dotnet provider, I was confused about the "MaxAttempts", I expected there would be some logic to end refresh when Attempts reaches MaxAttempts.