eclipse / microprofile-fault-tolerance

microprofile fault tolerance
Apache License 2.0
130 stars 64 forks source link

Add support for exponential backoff retry #220

Open cen1 opened 6 years ago

cen1 commented 6 years ago

Failsafe already has support for this.

Preferred and less wasteful policy than fixed period when the target service could be under heavy load and unresponsive for longer periods of time.

Emily-Jiang commented 6 years ago

sounds useful. +1 from me.

Emily-Jiang commented 6 years ago

We can add one more method to configure the backoffFactor with the default value of 2

carlosdlr commented 6 years ago

+1

brunobat commented 6 years ago

+1

antoinesd commented 6 years ago

Should be ok for our implementation. So +1 for me

jeanouii commented 6 years ago

Very useful. We can start agressive at first and then get slower and slower until we reach a max value and decide it's dead :)

cen1 commented 6 years ago

I see the pull request is like 20 LOC, what is the holdup for the merge?

Emily-Jiang commented 6 years ago

We talked this PR and had some questions. @antoinesd is going to update his PR accordingly. I will try to take this over, if Antoine cannot get around to do it.

lordofthejars commented 4 years ago

@Emily-Jiang I am really interested in this feature as well, but instead of focusing on backoff retry. I'd create some kind of interface called RetryPolicy or something like that, so users can implement their own strategies, notice that you could try to use the exponential backoff policy but maybe you prefer a Fibonacci one or an incremental one.

So the API should look like:

@Retry(retryPolicy=FibonacciRetryPolicy.class)
Ladicek commented 4 years ago

When I was looking at this a while ago, I realized that even adding just exponential backoff results in adding not-exactly-intuitive parameters to @Retry. Perhaps externalizing the backoff policy into a separate class like @lordofthejars suggests would be an elegant way to solve that, but unfortunately it's less declarative/configuration.

I personally would lean towards using more than one annotation. Just @Retry means constant backoff, configured by delay / delayUnit. We could add an annotation @ExponentialBackoff that, together with @Retry, would mean exponential backoff, with initial backoff configured by @Retry.delay, multiplication factor e.g. @ExponentialBackoff.factor, and max delay e.g. @ExponentialBackoff.maxDelay. If more backoff strategies are needed, we could add @FibonacciBackoff etc., but I think exponential is good enough.

lordofthejars commented 4 years ago

Well I have been checking in Microsoft patterns library and they support three (iterative, constant, exponential) but then if you check Awaitility open source project, you'll see that also supports Fibonacci, so I guess that these four strategies might be interesting to support.

Emily-Jiang commented 4 years ago

Let's discuss further next Tuesday and agree on the solution.

Azquelt commented 4 years ago

We discussed this on the call.

We thought:

For these reasons, we thought we could cover most use cases by just adding one parameter to @Retry which specified the backoff strategy to use.

Usage:

@Retry(delay=500, delayGrowth="fibonacci")

This allows us to add new strategies to the spec in the future and allows implementers to provide extra strategies if desired.

Ladicek commented 4 years ago

Personally, I don't really like the string approach, and some of the strategy names are too easy to misspell (exponential, fibonacci). I think we should use an enum.

On the other hand, I can see how string is appealing -- it can possibly even include a FQN of a class implementing some interface, which allows for ultimate flexibility. But there's an easy way how to allow that with an enum -- extra enum value CUSTOM, which would require a configuration property (not expressible via annotation) pointing to a class.

Emily-Jiang commented 4 years ago

As promised, I investigated this a bit more. Resilience4j has exponential backoff and random exponential backoff. Failsafe has also exponential backoff . I think we need to support similar exponential backoff as a minimum.

Emily-Jiang commented 4 years ago

We can see that there's immediate benefit to the exponential backoff strategy, but that other strategies might make more sense for certain scenarios.

@Ladicek found a paper concluding that a fibonacci backoff strategy was better than exponential backoff for some of the scenarios it examined.

It is not 100% accurate. The statement says:

One of these algorithms is Fibonacci increment backoff (FIB), FIB algorithm achieves higher throughput than the exponential backoff that is used by the standard IEEE 802.11 when it used in a mobile ad hoc network.

The performance is better when used in a mobile ad hoc network, which is very narrow use case. Remember the exponential backoff is used by the standare IEEE 802.11, which means a lot and it is very widely adopted strategy, as demonstrated by both failsafe and Resilience4j.

Ladicek commented 4 years ago

That's an out-of-context quote, as the entire paper focuses on that particular scenario. There are other papers that examine various backoff strategies in different contexts, e.g. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5375778/

What I mainly wanted to say is that only adding exponential backoff in the proposed way means we won't be able to add other backoff strategies in the future.

Currently, I even think that the amount of time we spent debating this shows that the design of @Retry is actively hostile against supporting more than 1 or 2 backoff strategies and should probably be revisited entirely.

cen1 commented 4 years ago

I would give a +1 to the single property with .class proposal since it seems to cover everything. It is type safe and extensible. Enum is nicer for regular case, uglier for custom case.

Another thought that came on my mind is whether the policy mechanism has wider use outside of @Retry, for example in @CircuitBreaker or anywhere else in fault tolerance where delays are defined.

Ladicek commented 4 years ago

At this point, I can think of several options that make some sense to me. I wanted to summarize them here. Beware, I don't fear of breaking changes :-)

  1. @Retry is already pretty big, and already includes one backoff strategy (constant) which has 4 (four!) configuration attributes. Backoff strategies should be separated out of the @Retry annotation:

    @Retry(maxRetries, maxDuration, maxDurationUnit, retryOn, abortOn)
    @RetryWithConstantBackoff(delay, delayUnit, jitter, jitterUnit)
    @RetryWithExponentialBackoff(initialDelay, initialDelayUnit, factor, [maxDelay, jitter, jitterUnit])
    @RetryWithFibonacciBackoff(initialDelay, initialDelayUnit, [maxDelay, jitter, jitterUnit])
    possibly others (polynomial, LILD, LIMD, MILD, MIMD, PLEB, PFB)

    When only @Retry is used, there's no delay between retries. Adding one of the annotations above defines how individual retries should be delayed. Adding more than 1 is tricky to handle, because of the interactions between method-level and class-level annotations and inheritance, but that's a problem we already have.

  2. Variant of above where the @*Backoff annotations are folded into the @Retry annotation. All backoff annotations have default values that correspond to "nothing":

    @Retry(maxRetries, maxDuration, maxDurationUnit, retryOn, abortOn,
        constantBackoff=@ConstantBackoff(...),
        exponentialBackoff=@ExponentialBackoff(...),
        fibonacciBackoff=@FibonacciBackoff(...)
    )

    Obvious downside is that it isn't possible to have sensible default values.

  3. Elaboration on the Class<? extends BackoffStrategy> attribute. At this point, I actually think this could be made pretty reasonable, using MP Config.

    @Retry(maxRetries, maxDuration, maxDurationUnit, retryOn, abortOn, delay, delayUnit, jitter, jitterUnit,
        delayGrowth = ConstantBackoff.class)
    
    interface BackoffStrategy {
        // delay and delayUnit is exactly what was configured for @Retry
        // I'm not married to this method signature (in fact I can see few better ones already), just something to demonstrate
        // jitter is applied outside of this
        Duration nextDelay(long delay, ChronoUnit delayUnit);
    }

    We would then ship these:

    class ConstantBackoff implements BackoffStrategy {
        Duration nextDelay(long delay, ChronoUnit delayUnit) {
            return Duration.of(delay, delayUnit);
        }
    }
    
    class ExponentialBackoff implements BackoffStrategy {
        final long factor;
        final long maxDelay;
    
        long last = -1;
    
        class ExponentialBackoff() {
            this.factor = ConfigProvider.getConfig().getOptionalValue(this.getClass().getName() + ".factor", Long.class).orElse(2L);
            this.maxDelay = ConfigProvider.getConfig().getOptionalValue(this.getClass().getName() + ".maxDelay", Long.class).orElse(Long.MAX_VALUE);
        }
    
        Duration nextDelay(long delay, ChronoUnit delayUnit) {
            last = (last < 0 ? delay : last) * factor;
            if (last > maxDelay) {
                last = maxDelay;
            }
            return Duration.of(last, delayUnit);
        }
    }
    
    class FibonacciBackoff implements BackoffStrategy {
        final long maxDelay;
    
        long lastA = 1;
        long lastB = 1;
    
        class ExponentialBackoff() {
            this.maxDelay = ConfigProvider.getConfig().getOptionalValue(this.getClass().getName() + ".maxDelay", Long.class).orElse(Long.MAX_VALUE);
        }
    
        Duration nextDelay(long delay, ChronoUnit delayUnit) {
            long current = lastB;
            long next = lastA + lastB;
            lastA = lastB;
            lastB = next;
            current *= delay;
            if (current > maxDelay) {
                current = maxDelay;
            }
            return Duration.of(current, delayUnit);
        }
    }

    This is a bit monstrous and requires using MP Config to configure the additional strategies. I have intentionally used this.getClass().getName() as part of the config key so that people can just create an empty subclass if they want different configuration for specific cases.

I would personally prefer option 1, obviously :-)

asantaga commented 2 years ago

Hey guys, can we have an update on this ER, it is something we would need for our project.

Ladicek commented 2 years ago

There's been no movement on this on the specification front. Depending on which implementation of MicroProfile Fault Tolerance you use, you may already have additional backoff strategies at your disposal. (Shameless plug: SmallRye Fault Tolerance provides exponential backoff, Fibonacci backoff, and completely custom backoff since version 5.2.0, see https://smallrye.io/blog/fault-tolerance-5-2/)