App-vNext / Polly

Polly is a .NET resilience and transient-fault-handling library that allows developers to express policies such as Retry, Circuit Breaker, Timeout, Bulkhead Isolation, and Fallback in a fluent and thread-safe manner. From version 6.0.1, Polly targets .NET Standard 1.1 and 2.0+.
https://www.thepollyproject.org
BSD 3-Clause "New" or "Revised" License
13.34k stars 1.22k forks source link

Calculated break duration for Circuit breaker and Advance Circuit breaker #653

Closed MattMinke closed 10 months ago

MattMinke commented 5 years ago

CircuitBreaker and AdvanceCircuitBreaker currently only allow for a fixed break duration. I would like to be able to calculated the break duration based on the number of consecutive "unhealthy" requests made. This would allow the duration to be increased as the downstream system continues to be unavailable. With this I would be able to increase the duration between attempts to interact with the downstream system over time hopefully allowing it to recover. With this I would also be able to reduce log entries for failed requests to the downstream service.

The use case I see for this is a system that is going to be down for awhile do to some catastrophic issue. Like the Retry policy this allows us to wait longer each time before attempting to interact with the system, and using Math.Min allows us to set an upper limit on the wait duration.



Example Usage

Policy.Handle<Exception>().CircuitBreaker(
                exceptionsAllowedBeforeBreaking: 5,
                durationOfBreak: unhealthyAttempts => TimeSpan.FromSeconds(Math.Min(20 + Math.Pow(2, unhealthyAttempts), /* Max */400)));
reisenberger commented 5 years ago

( Courtesy reply: commitments mean I may not get to respond to this until next week. Thanks. )

reisenberger commented 4 years ago

(EDIT: Deleted a June response of mine with a sub-optimal suggestion for this feature.)


We'd be happy to consider a PR for this feature.

A metric suitable for both original and advanced circuit-breaker could be that the calculated break duration takes as an input parameter numberOfConsecutiveHalfOpenTestFailures

0 = the circuit has just transitioned from Closed->Open (ie no half-open test failures yet) 1 = one attempt in half-open state was made and failed, the circuit just transitioned from HalfOpen->Open again 2 = two attempts in half-open state have been made and both failed, the circuit just transitioned from HalfOpen->Open for the second time (etc)

Brondahl commented 4 years ago

Note that I'm interested in picking this task up and submitting a PR.

One concern I have is that we'd probably want to modify all 5 "constructors" for both Sync and Sync version of the Circuit Breaker (and whatever exists for the Adv. version.

That leaves us with 3 options:

None of those is obviously ideal.

Preferences, @reisenberger ?

Opinions on this?

reisenberger commented 4 years ago

Hi @Brondahl ! Please follow the existing factory overloads pattern, creating new overloads where necessary (no breaking changes). We have a separate, longer-term proposal for simplfying the number of policy-configuration overloads. Thanks!

Brondahl commented 4 years ago

Design choice, please! There's a complication with implementing this, because there are 2 diffferent 'kinds' or 'places' of CB duration.

There's the obvious version, where requests are resolving very quickly, and the duration reset is triggered by ConsecutiveCountCircuitController.OnActionFailure, so that Controller can keep track of consecutive failures, and pass that count into Break_NeedsLock.

But there's also the scenario in which the request takes a really long time to resolve, or the nature of the failure is inherently a hang/timeout. In that second case, the duration reset is performed in CircuitStateController.PermitHalfOpenCircuitTest method, but the Interlock.CompareExchange call.

This is tricky, becuase the program context of the PermitHalfOpenCircuitTest call doesn't have a natural, neat, way to share common state with the OnActionFailure call.

I see 3 options:

My instinct is to go with option 2). But happy to take your steer, on your preferred option.

Brondahl commented 4 years ago

The other point that has come up is that the "throw if TimeSpan is non-positive" constraint is more natural to evaluate at Inner-Action-Runtime, rather than Policy-Setup-Time.

Note that a negative Span doesn't actually break anything, it's just not going to achieve anything useful. i.e. the only reason we have it (AFAICT), is so that we can tell the user that they've probably accidentally mis-configured the tool.

Given the many undetectable ways that the user can screw up a Policy config, I'm inclined to simply delete the check, the associated test, and the relevant lines in the Docs.

Is that acceptable? If not, then we keep the setup-time checks on constant durations and either: a) add a runtime check that throws the same exception, but later, for the dynamic durations. b) don't have protection against the dynamic durations being non-positive.

Brondahl commented 4 years ago

@reisenberger ☝️

Brondahl commented 4 years ago

See PR ☝️ :) Not quite "finished", there's a bunch of Tedious "and now do it all again 7 times" stuff to do, and some better tests to write, but the meat of it is in there.

What do you guys think? (@reisenberger )

martincostello commented 1 year ago

@martintmk Is this use case supported in Polly v8?

martintmk commented 1 year ago

Currently not, but it's easy enough to add post V8 release.

https://github.com/App-vNext/Polly/blob/dd14ca630b7614955580e7b26e0adc56f653c694/src/Polly.Core/CircuitBreaker/CircuitBreakerStrategyOptions.cs#L34

public abstract class CircuitBreakerStrategyOptions<TResult>
{
   public Func<BreakDurationGeneratorArguments, ValueTask<TimeSpan>>? BreakDurationGenerator { get; set; }
}

It will be optional generator that can override the fixed BreakDuration.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 60 days with no activity. It will be automatically closed in 14 days if no further updates are made.

atawLee commented 11 months ago

https://github.com/App-vNext/Polly/issues/653#issuecomment-1643422975 I will work on a PR for this issue.

martincostello commented 10 months ago

Resolved by #1776.