failsafe-lib / failsafe

Fault tolerance and resilience patterns for the JVM
https://failsafe.dev
Apache License 2.0
4.17k stars 296 forks source link

Regression: Failsafe 2.4.x getStageAsync may hang #303

Closed timothybasanov closed 1 year ago

timothybasanov commented 2 years ago

Something has changed after 2.4.2. A combination of Timeout and RetryPolicy with getStageAsync() makes Failsafe to hang sometimes. Here is an example that's reproducible on a 2.4.x, but not on a master branch:

Timeout<Integer> timeout = Timeout.of(Duration.ofMillis(100));
RetryPolicy<Integer> retryPolicy = new RetryPolicy<Integer>()
        .withBackoff(10, 30, ChronoUnit.MILLIS)
        .withMaxRetries(2);
var result = Failsafe.with(retryPolicy, timeout).getStageAsync(() -> CompletableFuture.supplyAsync(() -> {
            try {
                Thread.sleep(500);
                return 1;
            } catch (InterruptedException e) {
                throw new RuntimeException("Interrupted");
            }
        }));
System.out.println("Result=" + result.join()); // Hangs here

It would be nice to have a fix on 2.4.x branch for people that find it may be hard to migrate to 2.5.x for some time.

jhalterman commented 2 years ago

Thanks for filing. 2.5.x should be an easy upgrade for most people unless they're implementing custom policies. The reason I haven't released 2.5.0 yet is because I'm trying to decide if I should skip that version and just go straight to 3.0, which ends up changing a few of the SPI things that were in flux in 2.5.0.

jhalterman commented 2 years ago

It doesn't look like there will be an easy way to solve this and the other Timeout related problems that were fixed in 3.0 (and the 2.5 branch) without the internal changes that went along with them. I could release a 2.5, but atm it would include a bunch of other changes: https://github.com/failsafe-lib/failsafe/blob/2.5.0/CHANGELOG.md#250, so it's probably best in your case to just move straight to 3.0 when you can.

jhalterman commented 1 year ago

Closing this since I don't think I'll be doing more work on the 2.x branch.