[question/enhanchment] is it possible to reset the retry delay?

failsafe-lib / failsafe

Fault tolerance and resilience patterns for the JVM

https://failsafe.dev

Apache License 2.0

4.16k stars 295 forks source link

[question/enhanchment] is it possible to reset the retry delay? #341

Open bertbaron opened 2 years ago

bertbaron commented 2 years ago

I'm using failsafe for a job that should run 'forever' with the following code:

        RetryPolicy<Object> retryPolicy = RetryPolicy.builder()
                .withMaxRetries(-1)
                .withBackoff(Duration.ofSeconds(2), Duration.ofMinutes(2), 2.0)
                .onFailedAttempt(event -> log.warn("Backoff from failure", event.getLastException()))
                .build();
        var future = Failsafe.with(retryPolicy)
                .with(executor)
                .runAsync(this::runOnce);

Is there a way to reset the backoff delay to the initial delay when the job has been running for a given duration (say half an hour)? Or would it be an option to add a backoffReset duration?

If there is a better way to achieve my use case then please let me know.

jhalterman commented 2 years ago

There's not a way to reset a backoff currently, but one alternative is you can use a delay function to easily compute your own backoff delay and reset it if you wish. Have a look at RetryPolicyBuilder.withDelayFn. A delay function has access to the ExecutionContext which includes the attempt count.

bertbaron commented 2 years ago

Thanks for the quick response, I'll have a look into that!

bertbaron commented 2 years ago

I couldn't get it to work with a custom delay function because the context doesn't seem to provide enough information. When the previous delay would be part of the context the backoff algorithm, including delay factor and reset, could be written in a custom delay function. Another possibility would be an option to configure a reset duration in the RetryPolicyBuilder. Since I couldn't find an easy way to implement the former I created a draft PR with the latter option: https://github.com/failsafe-lib/failsafe/pull/342

If there is any change to get such an option merged I'll complete the PR with the necessary tests etc. Otherwise feel free to close the PR. I really like library and even without this option I think I'll settle with it for now, maybe working around this limitation in another way.

whiskeysierra commented 2 years ago

When the previous delay would be part of the context

You could just keep record of the last computed duration within your function, couldn't you? (This would make the function ~stateless~ stateful, but you would be independent of any updates to get that feature into failsafe.)

bertbaron commented 2 years ago

True. I didn't really consider that option because the contract explicitly states that "The delayFunction must ... not have side-effects, and always return the same result for the same input", but looking at the code it should work since its only called once per delay. Another workaround I consider is to just put an additional computed delay within my Runnable task itself. So indeed I don't really depend on a change of the library. It's just a feature you could consider. But as said, feel free to withdraw it.