failsafe-lib / failsafe

Fault tolerance and resilience patterns for the JVM
https://failsafe.dev
Apache License 2.0
4.16k stars 295 forks source link

Incorrect behaviour with getStageAsync when handling failures. #343

Closed AnnieTheEagle closed 2 years ago

AnnieTheEagle commented 2 years ago

It seems that while using Failsafe, if you configure a policy to count certain values as failures, if at the limit of attempts, there's still no successful value, it will still complete the future with the last value, instead of failing the future as retry attempt exceeded.

Say you have a policy like:

RetryPolicy<TestObject> retryPolicy = RetryPolicy.<TestObject>builder()
    .handleResultIf(result -> result == null || result.someField == null)
    .withMaxAttempts(3)
    .onFailedAttempt(a -> LOGGER.warn("Failed attempt {}", a.getAttemptCount())
    .onFailure(a -> LOGGER.error("Couldn't get data after 3 attempts"))
    .build();

return Failsafe.with(retryPolicy).getStageAsync(() -> this.test());

And the test method is:

private CompletableFuture<TestObject> test() { 
    return CompletableFuture.completedFuture(null);
}

Then the resultant CompletableFuture<TestObject> created by getStageAsync will still complete (with value null) causing potential NPEs down the line.

It works properly with exceptions (e.g. if I changed the test method to return CompletableFuture.failedFuture(new RunTimeException("test"));

I'm thinking that the future returned by getStageAsync should fail if retry attempts are exceeded (and thus the onFailure listener is called).

jhalterman commented 2 years ago

Failsafe will always return the last execution result after retries are exceeded. If that's a null value from a CompletableFuture, that's what you'll get. If it's an exception, that's what you'll get. It sounds like in your case, you'd prefer to have an exception rather than a null value, to avoid NPEs? In that case, you might want to wrap your RetryPolicy in a Fallback, ex:

RetryPolicy<TestObject> retryPolicy = RetryPolicy.<TestObject>builder()
  .handleResult(null)
  .onFailedAttempt(a -> System.out.println("Failed attempt " + a))
  .onFailure(a -> System.out.println("Failed " + a))
  .build();
Fallback<TestObject> fallback = Fallback.<TestObject>builderOfException(e -> new RuntimeException("retries exceeded"))
  .handleResult(null)
  .build();

// Performs 3 attempts then completes the future with a RuntimeException
Failsafe.with(fallback).compose(retryPolicy).getAsync(() -> null);

Here, the fallback will handle any null result from the inner retryPolicy and return a RuntimeException instead. I hope that makes sense. Let me know if I misunderstood something.

jhalterman commented 2 years ago

I just updated the retry policy docs to describe the default behavior a bit better: https://failsafe.dev/retry/

I also added this to the FAQ: https://failsafe.dev/faqs/#how-to-i-throw-an-exception-when-retries-are-exceeded

Thanks for bringing this up.