failsafe-lib / failsafe

Fault tolerance and resilience patterns for the JVM
https://failsafe.dev
Apache License 2.0
4.17k stars 297 forks source link

Issue while using Timeout #255

Closed ke7insud closed 3 years ago

ke7insud commented 4 years ago

Hi

I have tried to post the question on StackOverflow but didn't get the reply, so I am trying here also.

https://stackoverflow.com/questions/62317279/jodah-failsafe-library-timeout-is-taking-longer-then-expected

Can you please go through it and let me know what am I doing wrong

whiskeysierra commented 4 years ago

I answered on SO.

On Tue, Jun 16, 2020 at 8:18 AM ke7insud notifications@github.com wrote:

Hi

I have tried to post the question on StackOverflow but didn't get the reply, so I am trying here also.

https://stackoverflow.com/questions/62317279/jodah-failsafe-library-timeout-is-taking-longer-then-expected

Can you please go through it and let me know what am I doing wrong

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/jhalterman/failsafe/issues/255, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADI7HIUZS3H53GDNORD3KTRW4FEDANCNFSM4N7JMO6A .

gm2211 commented 4 years ago

@whiskeysierra I think it's slightly confusing that the Timeout policy is completely ignored for blocking calls - maybe a more desirable behavior would be to have a precondition in get(...) like

Preconditions.checkState(
    policies.stream().noneMatches(PolicyUtils::isTimeout), 
    "Please use .withMaxDuration(...) instead of a timeout when using a blocking method like get(...). "
     + "Note that .withMaxDuration will not interrupt execution")`

or at least log.warn since starting to throw would be a breaking change.

Also, if the evaluation of the timeout policy is scheduled on a different thread than the thread that executes the blocking call, that different thread could in theory interrupt the thread executing the blocking call.

Anyways, leaving that aside, I'm also having some issues with Timeouts with getAsync(...):

RetryPolicy<Boolean> retryPolicy = new RetryPolicy<Boolean>()
    .handleResultIf(retry -> retry) 
    .withMaxAttempts(-1);
Timeout<Boolean> timeout = Timeout.<Boolean>of(Duration.ofSeconds(1))
    .withCancel(true)
    .onFailure($ -> System.out.println("failure"))
    .onSuccess($ -> System.out.println("success"));
Failsafe
    .with(retryPolicy, timeout)
    .getAsync(ctx -> !ctx.isCancelled())
    .get();

This hangs forever (constantly printing "success"), while I would expect the timeout to cause the context to become canceled after 1 second, thus completing the future. Am I missing something?

Tembrel commented 4 years ago

Answering second part first, which will lead to a response to the first part. Some of this probably belongs on StackOverflow.

Am I missing something?

Several things, I believe.

From the way you wrote the code, I think you wanted to have the task repeat under a retry policy and have the timeout apply to the repeated execution. For that you need to change the order of the policies from with(retryPolicy, timeout) to with(timeout, retryPolicy). (See @whiskeysierra's comment on #253, and #254 and its related issues for more about why this is.) With this simple change, your code simply prints success once and exits, which I'm guessing is what you were expecting. (It works that way with the synchronous get instead of getAsync, too.)

But why does it behave the way it does with the policies in the order retryPolicy, timeout?

Most directly, the !ctx.isCancelled() expression is evaluated when the asynchronous task executes, but the timeout policy isn't scheduled to fire until 1 second has elapsed. Barring some unlikely execution scheduling, the task will complete successfully before the timeout fires, returning true, and the retry policy will obligingly retry ad infinitum.

If you tweak your code as follows to simulate some work during the task execution, it repeatedly fails rather than repeatedly succeeding (I limited the attempts to 3 to avoid the infinite loop):

...
    .withMaxAttempts(3);
...
Failsafe
    .with(retryPolicy, timeout)
    .getAsync(ctx -> {
    try {
        TimeUnit.SECONDS.sleep(2);
    } catch (InterruptedException ex) {
        // XXX swallow exception; bad policy in practice
    }
    return !ctx.isCancelled();
    })
    .get();

This code prints failure 3 times and exits by re-throwing the last TimeoutExceededException, which is how the retry policy reports its failure after it runs out of attempts.

Secondly, the argument to withCancel is not true for cancel and false for do-not-cancel. It is whether in addition to cancelling the Failsafe machinery is allowed to interrupt the execution. The code above behaves identically if you set withCancel(false), but you can see the difference if you run this code with different values for CAN_INTERRUPT:

...
    .withCancel(CAN_INTERRUPT)
...
Failsafe
    .with(retryPolicy, timeout)
    .getAsync(ctx -> {
    boolean cancelled = false;
    boolean interrupted = false;
    try {
        TimeUnit.SECONDS.sleep(2);
    } catch (InterruptedException ex) {
        // XXX swallow exception; bad policy in practice
        interrupted = true;
    }
    cancelled = ctx.isCancelled();
    System.out.printf("cancelled=%s, interrupted=%s%n", cancelled, interrupted);
    return !cancelled;
    })
    .get();

With CAN_INTERRUPT set to true, you get 3 instances of:

cancelled=true, interrupted=true

When CAN_INTERRUPT is false, you get 3 instances of:

cancelled=true, interrupted=false

If you comment out the withCancel(CAN_INTERRUPT) line, you get 3 instances of:

cancelled=false, interrupted=false

Perhaps surprisingly, it makes no difference what value is returned by the task. That's because the retry policy above uses handleResultIf, which still allows for failure conditions to cause a retry, in this case, TimeoutExceededException. If you use something like the following, the retry will not try to handle the timeout (note that we test the result for nullity before testing its value):

    .handleIf((retry, f) -> retry != null && retry)

Also, and here's the beginning of a response to the first part of your comment:

I think it's slightly confusing that the Timeout policy is completely ignored for blocking calls

Timeout is not completely ignored for blocking calls. @whiskeysierra wrote on SO:

Timeout is best used in conjunction with an asynchronous operation, usually indicated by *Async methods.

And that's a reasonable rule of thumb, but in this case, changing getAsync to get (and omitting the final get) produces the same results. It works as you supposed:

...if the evaluation of the timeout policy is scheduled on a different thread than the thread that executes the blocking call, that different thread could in theory interrupt the thread executing the blocking call.

Note that, as described above, whether interruption is used depends on the argument passed to withCancel. The use of interruption can be extremely delicate, and not all blocking code is responsive to interruption, but if your tasks stick to standard library APIs that use throws InterruptedException to indicate that kind of responsivness then it should be possible to use timeout policies with them.

But even using withCancel(false), if the running task checks whether the execution context has been cancelled, it can be used in conjunction with a timeout policy. The reason to recommend asynchronous execution with timeouts is that the main thread is free to continue with other work while the timed execution proceeds.

jhalterman commented 3 years ago

For posterity, in addition to what @Tembrel mentioned, I should point out that the docs discuss cooperative cancellation and interruption, which can be used with either sync or async executions:

https://failsafe-lib.github.io/execution-cancellation/#handling-cancellation

Feel free to re-open this issue if needed.