failsafe-lib / failsafe

Fault tolerance and resilience patterns for the JVM
https://failsafe.dev
Apache License 2.0
4.19k stars 297 forks source link

Use JDK 9's CompletableFuture.delayedExecutor for scheduled executions, when available #197

Open jhalterman opened 5 years ago

jhalterman commented 5 years ago

As suggested here.

See also: https://github.com/ben-manes/caffeine/blob/fb89d910bc4199ef63b621bb3772d6dcf7d26045/caffeine/src/main/java/com/github/benmanes/caffeine/cache/Scheduler.java#L104

jhalterman commented 5 years ago

I was just starting on this and realized that delayedExecutor returns Executor rather than ScheduledExecutorService, which is a problem since Executor doesn't give me a Future to work with. It's important that users should be able to cancel scheduled executions.

It looks like OpenJDK's delayedExecutor returns an executor that delegates to a ScheduledExecutorService internally. I could always setAccessible and reflect it, falling back if it fails, but Failsafe hasn't approached that level of hackiness so far, so I'm hesitant to go there :)

ben-manes commented 5 years ago

The intent is CompletableFuture.supplyAsync(supplier, executor) which can be canceled.

whiskeysierra commented 5 years ago

The re-scheduling part needs to be done by hand though, right?

whiskeysierra commented 5 years ago

Pretty much this: https://github.com/jhalterman/failsafe/issues/106#issuecomment-450060750

But the Scheduler interface in Failsafe is simpler than ScheduledExecutorService, so the implementation would be shorter I guess.

ben-manes commented 5 years ago

Yes, I expected just using Failsafe's scheduler would be good enough for its internal usage, which becomes pretty much a slightly more complex version of,

return CompletableFuture.supplyAsync(callable::call,
    CompletableFuture.delayedExecutor(delay, unit));
whiskeysierra commented 5 years ago

Ah, right. I just realized that Failsafe only ever uses ScheduledExecutorService#schedule and is doing the re-scheduling by hand already anyway.

jhalterman commented 5 years ago

Re:

return CompletableFuture.supplyAsync(callable::call,
    CompletableFuture.delayedExecutor(delay, unit));

I dug into the JDK a bit and it's not exactly ideal the way things work for this use case since cancellation is not propagated to the underlying Delayer's Future.

So cancelling the future causes the Runnable/Supplier that you pass to runAsync/supplyAsync not to be called, but the delayed execution still happens since there's no way to cancel the underlying Delayer's Future which is not captured. A quick test debug confirms this. This is kind of unfortunate that delayed executions would still happen, even when they're cancelled, even if they just perform a noop.

ben-manes commented 5 years ago

yeah, it doesn't propagate the setRemoveOnCancelPolicy(true) to remove on cancelation. I thought about reporting it originally, but it's also not incorrect. The default for any ScheduledThreadPoolExecutor is to not remove on cancel, so most user created ones would act the same unless explicitly set, which few do. That includes your scheduler created by DelegatingScheduler, which means the impact is identical.

To remove on canceling of the CompletableFuture via delayedExecutor would require that the internals do an instanceof check in a hot path of its internals. I can understand why they might not want to do that, so I didn't ask on concurrency-interest when I had dug into the semantics.

whiskeysierra commented 5 years ago

In the end it's not that big of a deal, is it? It will do the right thing, just burn a couple of CPU cycles for nothing, correct? Sounds like a small price to pay for not having to manage any threads.

ben-manes commented 5 years ago

The argument could be that it increases the min-heap size, which scales at O(lg n). That's likely more costly than the thread context switches. If Failsafe schedules work very frequently, e.g. on every user request, then perhaps that can pile up? If so, then a single scheduler will have a lot of dead nodes. For example see this benchmark.

However since past releases and almost all usages would have relied on a scheduler that doesn't proactively remove cancelled tasks, and that a user-specified scheduler can be supplied if a perf issue is found in their use-case, I also think it is a great bargain to avoid creating threads.

jhalterman commented 5 years ago

That includes your scheduler created by DelegatingScheduler, which means the impact is identical

I realized this as well when I was digging into that JDK code, so I added setRemoveOnCancelPolicy(true) which was the behavior I intended. Despite Failsafe's internal delayer being one more thread, being able to properly cancel and remove cancelled executions seems like a good thing in general, in which case, I'm not sure it makes sense to use delayedExecutor.

ben-manes commented 5 years ago

Since delays are in seconds the cost is trivial, as @whiskeysierra noted. It may be cheaper as there is less synchronization, hard to know without a benchmark. The static thread causes issues and, imho, isn’t doing anything of value. Given a lack of benchmarks one way or the other this is all personal preferences, except that there are known drawbacks to managing the thread explicitly.

jhalterman commented 5 years ago

What kind of issues concern you re: the static thread?

My main concern isn't so much the runAsync -> delayedExecutor handoff, it's that delayedExecutor tasks may pile up since they can't be cancelled and removed.

ben-manes commented 5 years ago

See https://github.com/jhalterman/expiringmap/issues/61 as one example. The ability to isolate and unload a classloader was historically very popular and can still be useful in some cases. I do not do this myself, as I think it causes more problems compared to an embedded container. However, I also think it is a reasonable expectation to be friendly to class unloading as someone may have a justifiable reason. I think libraries should try to help users solve their problems and then get out of the way, whereas frameworks of old were frustratingly very dictatorial.

The cancelled tasks would discard their internal state to be very inexpensive. The cost is then a few nodes on a heap and the thread waking up. Since timeouts are in short durations this shouldn't pile up for long, so generally it doesn't seem like a problem.

I very much agree with making it pluggable to let users workaround problems if they encounter them. So either way if a user has a problem they can resolve it and discuss if there is a general fix. However, the static thread blocking class unloading is not as easy to fix if Failsafe is used through a dependency, where its configuration cannot be overridden. That means either having a global static setting that the user can override early on or making the default more tolerant.

jhalterman commented 5 years ago

I think libraries should try to help users solve their problems and then get out of the way, whereas frameworks of old were frustratingly very dictatorial.

Agree 💯

Re: the class unloading problem, I wonder if users would run into it using the JDK 9 common pool's static delayer anyways (I would think so). In the case of Failsafe, the internal delayer is only created if it's needed. You can configure your own ScheduledExecutorService via .with(scheduler). I plan to improve the discussion of schedulers and threading in the project docs to make these tradeoffs more clear.

The cancelled tasks would discard their internal state to be very inexpensive.

It's only the outer runAsync tasks that would be cancellable with the runAsync(runnable, CompletableFuture.delayedExecutor(...)) approach, the inner delayedExecutor tasks would still pile up, right? This makes me wonder if strong references could prevent the tasks from being GC'd, as is easy to reproduce when cancelling ScheduledExecutorService tasks where setRemoveOnCancelPolicy is false. Agree, some testing is needed...

ben-manes commented 5 years ago

Re: the class unloading problem, I wonder if users would run into it using the JDK 9 common pool's static delayer anyways (I would think so)

The JDK can cheat by using a different classloader for its runtime, so that turns out not to be a problem :)

the inner delayedExecutor tasks would still pile up, right?

Yes. The cancel should clean up any user-provided state, like the given runnable task. I don't know about that blog post, since later JDKs do clean up in both CompletableFuture and ScheduledFutureTask. So the GC effects should be just the delayed future tasks which are lightweight at that point. But yeah, tests are best at this point.

whiskeysierra commented 5 years ago

You can configure your own ScheduledExecutorService via .with(scheduler).

I just realized that I have another reason to supply my own and that is the ability to transport thread locals, e.g. needed for (open) tracing.