Kotlin / kotlinx.coroutines

Library support for Kotlin coroutines
Apache License 2.0
13.07k stars 1.85k forks source link

Coroutines perform unnecessary thread switches #4040

Open ListenableFuture opened 9 months ago

ListenableFuture commented 9 months ago

Tl;dr - Coroutines seem to have a trade-off between performing extra thread switches versus using Dispatchers.Unconfined, and there doesn’t appear to be an ideal solution

We’re creating a set of CoroutineScopes for our Android application, which may be tied to the lifecycles of various subcomponents, and we’d like to ensure they run asynchronous code efficiently. For reference, the dispatchers are created from our existing Java executors using ExecutorService.asCoroutineDispatcher(), since the executors are already used elsewhere in the application.

Issue 1 - Unnecessary thread switches

Suppose we have a mainCoroutineScope which uses the application’s main thread:

// Thread A
mainCoroutineScope.launch {
  // Main thread
  suspendBackground()
  // Main thread
  suspendBackground()
  // Main thread
}

In the example above, the CoroutineScope will repeatedly switch between Main Thread and Background Thread, even if the code blocks running on the main thread are empty. This enqueues unnecessary work on the main thread, which is a concern in Android, since the main thread is used to render UI and needs to stay responsive. The CoroutineScope will also take longer to complete its tasks, since it needs to wait for the main-threaded tasks to complete, and the main thread is often busy performing other work.

Issue 2 - Reentrancy

To avoid the unnecessary thread switches above, we can use inline dispatch with Dispatchers.Unconfined. This allows the example above to have the following ordering, which no other scope can do:

Thread A -> suspendBackground() -> inline code (background thread) -> suspendBackground() -> inline code (background thread)

In general, Dispatchers.Unconfined appears to be the only way for a CoroutineScope to allow full control over its thread behavior. This indicates our scopes should use Dispatchers.Unconfined by default, however this is discouraged by the Kotlin docs, and introduces concerns with reentrancy.

Comparison with Java

In Java, directExecutor() is the equivalent of Kotlin’s Dispatchers.Unconfined, and with Java’s Futures, the user is forced to choose an executor at each call. However in Kotlin, Unconfined implicitly runs the code inline, so the user must actively choose when to switch executors, otherwise it runs everything like directExecutor() by default. To make Unconfined scopes more explicit like Java, we would need a way to force users to choose a dispatcher for each code block.

Summary

There seems to be a trade-off between extra thread switches versus reentrancy issues, and there doesn’t appear to be an ideal solution. Kotlin provides everything we need to control thread switching, but we’d need to default all our scopes to Unconfined, and users would need to be fairly experienced to ensure all their code is safely dispatched. I’m wondering the Kotlin team’s perspective of this, and what they would recommend. In particular:

dkhalanskyjb commented 9 months ago

This kind of usage questions is better suited for Stack Overflow or Kotlin Slack. The issue tracker is for issues in the coroutines library.

In any case, I completely failed to understand any of the points you're raising. I'll reply to each part separately so you can see which parts I didn't understand and why, and hopefully clarify what you mean.

Issue 1 - Unnecessary thread switches

The functions from kotlinx-coroutines intentionally avoid "thread switches" (called "suspensions" in the coroutines lingo): if a function can avoid suspending, it typically will, so no thread switch will occur. A notable exception is the yield() function, used intentionally for suspending, giving up control of the thread so that other tasks can run on it.

I don't know what suspendBackground is, but it's not part of our library. If you're experiencing suspensions that you think are not actually needed with it, please contact the authors of that function.

even if the code blocks running on the main thread are empty. This enqueues unnecessary work on the main thread

A good solution would be to write code in such a way that there are no empty code blocks between suspensions. If you can show examples of idiomatic code that suspends even when it shouldn't, we can discuss them on a case-by-case basis.

Issue 2 - Reentrancy

I don't understand the ordering shown, or what issue you're trying to highlight. A realistic example with a clarification of what behavior you expect and what you get instead would be very helpful.

In general, Dispatchers.Unconfined appears to be the only way for a CoroutineScope to allow full control over its thread behavior.

I don't know what "full control" that you want in this context is. With Dispatchers.Unconfined, you lose the guarantees that code will execute on the thread that you want. Unconfined should only be used when you fully understand why you need it, it's not the default choice in any sense.

There is Dispatchers.Main.immediate that will behave like Dispatchers.Unconfined if the task is enqueued from the main thread, but like Dispatchers.Main if the task is enqueued from a separate thread. Maybe you want that.

However in Kotlin, Unconfined implicitly runs the code inline [...] otherwise it runs everything like directExecutor() by default.

directExecutor() also runs all code inline, as stated explicitly in its documentation: https://guava.dev/releases/19.0/api/docs/com/google/common/util/concurrent/MoreExecutors.html#directExecutor()

There seems to be a trade-off between extra thread switches versus reentrancy issues, and there doesn’t appear to be an ideal solution. Kotlin provides everything we need to control thread switching, but we’d need to default all our scopes to Unconfined, and users would need to be fairly experienced to ensure all their code is safely dispatched. I’m wondering the Kotlin team’s perspective of this, and what they would recommend.

As you didn't show specific examples of unneeded suspensions, we can't do anything to help you solve them.

Are there any optimizations the language or compiler can do to avoid unnecessary thread switches?

No, because suspensions are often intentional, and when they aren't, they are typically necessary. Removing suspensions during compilation would therefore break the expected behavior without bringing any of the benefits.

Would you recommend Unconfined as the default for CoroutineScopes tied to the lifecycle of subcomponents?

No, you'd be giving up predictable threading behavior if you did that.

Otherwise, is there a way to efficiently call suspending code while avoiding the dangers of inline execution?

You didn't specify what you mean by "efficiently;" subjectively, the most straightforward code that uses kotlinx-coroutines is typically quite efficient. If you measure the performance of your application and notice that the machinery of the coroutines library takes an abnormally large portion of the runtime (more than would be needed if you were to write equivalent code without using the library), then we can treat it as a performance bug and seek to fix it, or maybe give you advice on how to restructure your code so that it utilizes the library better.

ListenableFuture commented 9 months ago

Thanks for the details and taking the time to respond, and I’m sorry for the confusion about what I intended to ask. Let me include an example which I think will help clarify the points in my original post.

In the original post and example below, suspendBackground is intended to be a suspending method that switches to a background thread, such as the following:

suspend fun suspendBackground() {
  withContext(backgroundContext) {
    // Inner code
  }
}

Example

A good solution would be to write code in such a way that there are no empty code blocks between suspensions.

We normally try to follow this pattern, however one case we’ve found that makes it difficult is when the suspending API is provided by a library that we don’t own. For example, suppose library A provides LibraryA.suspendBackgroundA(), which is a suspending method that uses a background dispatcher specific to library A. Similarly, library B provides another suspending method LibraryB.suspendBackgroundB(), which uses a different dispatcher specific to library B. An example of such a library could be a data store library.

Suppose that we need to call both methods:

// Initial thread
mainCoroutineScope.launch {
  // Main thread
  libraryA.suspendBackgroundA()
  // Main thread
  libraryB.suspendBackgroundB()
  // Main thread
}

In the example above:

  1. The code starts on initial thread, and calls launch to create a new coroutine which runs on main thread
  2. The coroutine then calls libraryA.suspendBackgroundA() and suspends, then the background-threaded library code runs, and resumes back to main thread
  3. The coroutine then calls libraryB.suspendBackgroundB() and suspends, then the background-threaded library code runs, and resumes back to main thread

So the ordering of thread switches goes:

Initial thread -> main thread -> background A thread -> main thread -> background B thread -> main thread

While this technically matches the specifications of coroutines, we feel a lot of our developers aren't going to be thinking about extra returns to the main thread in some of these places, like at the end of the block or between the suspending library calls. A lot of the time, we think developers want something like

Initial thread -> background A thread -> background B thread

Are there best practices we can follow that can help mitigate these extra dispatches by default? We know we can use Dispatchers.Unconfined as the default context, but that doesn't seem to be a best practice.

Thanks again for taking your time to help with this.

dkhalanskyjb commented 9 months ago

With

suspend fun suspendBackground() {
  withContext(backgroundContext) {
    // Inner code
  }
}

you can avoid extra dispatches by writing your code like so:

// Initial thread
mainCoroutineScope.launch {
  withContext(backgroundContext) { // <------
    // Main thread
    libraryA.suspendBackgroundA()
    // Main thread
    libraryB.suspendBackgroundB()
    // Main thread
  }
}

Or even something like withContext(backgroundContext[CoroutineDispatcher]!!) should do the trick.

ListenableFuture commented 9 months ago

Thanks for the response. From what I understand, the solution you mentioned would only be possible if the caller knows and has access to the dispatchers the suspend methods are using. Since these methods are split by API boundaries, it would generally break encapsulation to know these things. In general the dispatchers may be internal to libraries A and B, and also distinct from each other.

Additionally, in the proposed solution, wouldn’t there still be extra switches to the main thread before and after the withContext block?

// Initial thread
mainCoroutineScope.launch {
  // Main thread
  withContext(backgroundContext) { // <------
    …
  }
  // Main thread
}

We can avoid this by calling mainCoroutineScope.launch(backgroundContext), however I believe it kind of oversimplifies the problem. For example in general, doesn’t it mean a coroutine will always switch back to its original context at the end, even if it’s just to run an empty block?

dkhalanskyjb commented 9 months ago

From what I understand, the solution you mentioned would only be possible if the caller knows and has access to the dispatchers the suspend methods are using. Since these methods are split by API boundaries, it would generally break encapsulation to know these things.

It breaks incapsulation even to discuss the context switches. What if suspendBackground is optimized to rely on the context switches between different calls to suspendBackground? For example, this would perform more context switches with your fix than it would with straightforwardly-written code:

// return result now, and clean up in the same thread in parallel after the user receives the result already
withContext(backgroundContext) {
  awaitEarlierCleanups() // suspend until the cleanups for the earlier procedures finish
  val result = getResultForTheUser()
  myPrivateScope.launch { // uses the same dispatcher as `backgroundContext`
    cleanupComputation(result)
  }
  result
}

When you assume this doesn't happen, you're already breaking the API boundaries. Relying on knowing a particular dispatcher just means going one step further.

Additionally, in the proposed solution, wouldn’t there still be extra switches to the main thread before and after the withContext block?

Yes, there would be.

a coroutine will always switch back to its original context at the end, even if it’s just to run an empty block?

In this particular case with different dispatchers, yes.

ListenableFuture commented 9 months ago

For example in general, doesn’t it mean a coroutine will always switch back to its original context at the end, even if it’s just to run an empty block?

In this particular case with different dispatchers, yes.

Thanks for clarifying. Do you have any advice or best practices on how we can avoid this? For example, I’d like to avoid the switch at the end for something like this:

mainCoroutineScope.launch {
  // Main thread
  doSomeWorkOnMainThread()
  withContext(backgroundContext) {
    // Background thread
    doSomeWorkOnBackgroundThread()
  }
  // Main thread - switch that I want to avoid
}
dkhalanskyjb commented 9 months ago
mainCoroutineScope.launch(backgroundContext, start = CoroutineStart.UNDISPATCHED) {
  withContext(Dispatchers.Main.immediate) {
    doSomeWorkOnMainThread()
  }
  doSomeWorkOnBackgroundThread()
}
dkhalanskyjb commented 9 months ago

However, before overcomplicating code like this, ensure by measuring performance that this is a real issue for your codebase. Manually calculating how many thread switches happened isn't something we see even in codebases prioritizing performance, outside of maybe tight loops.

lowasser commented 9 months ago

Chiming in: I’m the lead for Google's first-party usage of the Kotlin ecosystem. The folks above are the owners of the primary framework for our first-party Android app development.

As I understand it, here are the key points:

It is more common to find this problem in the format

suspend fun runsOnUiThread() {
  doSomething()
  doSomethingElse()
}
suspend fun doSomething() = withContext(backgroundDispatcher) { ... }
suspend fun doSomethingElse() = withContext(backgroundDispatcher) { ... }

where these functions are often separated across files and directories. Three such dispatches were enough to cause 60ms lag in our older issue, so in an app with sufficient UI thread contention, we fear that could add up very quickly to user-perceptible lag.

I don't think we've yet measured this specific concern observably in a Kotlin app, but we're concerned by how outwardly simple, best-practice-following code is being compiled to an antipattern we've seen cause significant performance issues in the past, in a way that seems difficult for static analysis (or even a developer looking by hand -- I wouldn't catch this easily) to detect. This is of additional concern because these extra dispatches are often added as we migrate apps from Java Future-based solutions to coroutines: future-based code doesn't have a dispatch step added as you return a value, but as we migrate to coroutines, we introduce dispatches at points like Y and Z in the above example as we exit from some thread pool to Dispatchers.Main.

The reason I recommended to @ListenableFuture that they file a GitHub issue, rather than a StackOverflow question, is that I think there might be some way for kotlinx.coroutines (or the Kotlin language) to just optimize away these context switches for everyone. For example, perhaps the compiler could change how it builds the suspension state machine to identify and leave out redundant transitions -- especially the "just exit from the scope, maybe with a result" case of Y and Z above. Alternately, perhaps CoroutineDispatcher.isDispatchNeeded could do something smart here that recognizes the unnecessary context switch, or be given some extra information that would help.

I'm not sure if such a solution is possible, but it seemed worth investigating if it can be done at the level of the coroutines framework.

ListenableFuture commented 9 months ago
mainCoroutineScope.launch(backgroundContext, start = CoroutineStart.UNDISPATCHED) {
  withContext(Dispatchers.Main.immediate) {
    doSomeWorkOnMainThread()
  }
  doSomeWorkOnBackgroundThread()
}

Thanks for the details. In general, that solution would require matching the outer coroutine’s dispatcher with the dispatcher of the last suspension point. This pattern may work for some cases, but wouldn’t be possible if the dispatcher is internal to a library. Additionally if a third dispatcher is added to the example, such as a disk I/O dispatcher, this no longer works:

// Initial thread
mainCoroutineScope.launch(backgroundContext, start = UNDISPATCHED) {
  // Initial thread
  withContext(mainCoroutineScope) {
    // Main thread
    doSomeWorkOnMainThread()
  }
  // Background thread - switch that I want to avoid
  withContext(diskIoContext) {
    // Disk I/O thread
    doSomeWorkOnDiskIoThread()
  }
  // Background thread
  doSomeWorkOnBackgroundThread()
}

The ordering of thread switches goes:

Initial thread -> main thread -> background thread -> disk I/O thread -> background thread

I believe in this case, the only solution to have our desired behavior would be to use Dispatchers.Unconfined:

Initial thread -> main thread -> disk I/O thread -> background thread

Regarding the performance impact, there’s two aspects:

  1. The primary concern is the empty blocks may be queued behind other longer-running tasks on the executor, like layout inflation, and so the coroutine takes longer to complete. If the executor has an empty queue, performing context switches to different threads can still add noticeable latency
  2. A secondary concern is the cost to actually run the empty blocks. This is probably trivial, but it’s non-zero, and so the executor may take longer to process other tasks in its queue
dkhalanskyjb commented 8 months ago

I'm not sure if such a solution is possible

I'd wager it's not. Let's look at withContext(outer) { withContext(inner) { A() } }, assuming every outer doesn't use the same threads as inner. What looks like empty spaces around the inner withContext actually has behavior attached to it.

It's not technically impossible to only touch the outer thread if CopyableThreadContextElement instances are there, cancellation happened, or some non-trivial computation took place after, but this would require either changing the ABI of suspend functions significantly, or introducing a completely new mechanism of extremely complex rewrite rules to the Kotlin compiler.

@qwwdfsad, do you have any extra insights regarding this?

qwwdfsad commented 8 months ago

Thanks for the detailed discussion, especially for the examples of the potential performance issues.

The reasonable fix for this issue does not seem to be possible for a variety of reasons, most of them highlighted by @dkhalanskyjb.

The orthogonality of suspendable computations (basically, suspend fun in the language) and the way they form multitasking (interception in the language terms, kotlinx.coroutines in fact) is both a blessing and a curse.

Any attempt to optimistically guess, cheat or ignore this orthogonality will lead to an encapsulation breakage (which might be okay in some scenarios and will lead to puzzlers and counter-intuitive behaviour in others) and, more importantly, to conceptual disparity, where the semantics of the code starts depending on the minor details.

Here is the taste of it:

CoroutineDispatcher and its operations are observable and potentially stateful entities, as well as ThreadContextElement. Unfortunately, it means that skipping or adding a potential dispatch ("thread switch") is an observable behaviour that may change the semantics or the output of the program.

Even with such an approximation, the abstraction becomes leaky immediately. E.g. consider this optimization somehow implemented and a large enough codebase decided to implement dispatcher wrapping in order to give it a name. As long as such a change makes it into the codebase, it changes the behaviour of the whole codebase that uses such a dispatcher and (now) its wrapped version. Moreover, it's the worst of the two worlds: it potentially changes the semantics of users who do not care about the number of dispatches at all ("current state"), and it also changes the performance characteristic of the code that relied on this optimization (because, ultimately, performance is the reason this potential).

Apart from that, it immediately exposes the question Dmitry raised -- is the "empty space" a thing to make this optimization disappear? A code comment? Unused variable optimized by kotlinc? An empty inline function?

So, at the end of the day, this optimization is extremely brittle, yet it has visible and observable side effects.

elizarov commented 8 months ago

I think that you can alleviate this problem by using withContext in strategic places without breaking encapsulation. Let's work from the example by @lowasser :

suspend fun runsOnUiThread() {
  doSomething()
  doSomethingElse()
}
suspend fun doSomething() = withContext(backgroundDispatcher) { ... }
suspend fun doSomethingElse() = withContext(backgroundDispatcher) { ... }

This particular case is easy to fix by:

suspend fun runsOnUiThread() {
  withContext(backgroundDispatcher) {
    doSomething()
    doSomethingElse()
  }  
}

But what if runsOnUiThread does not know which particular dispatchers doSomethingXxx functions use? It is not a problem as long as you have the following rule:

All dispatchers used in all your libraries must be derived from Dispatchers.Default.

All such dispatchers are capable of sharing threads. With such a rule you can always write:

suspend fun runsOnUiThread() {
  withContext(Dispatchers.Default) {
    doSomething()
    doSomethingElse()
  }  
}

Now, if your doSomethingXxx are annotated with some annotation to specify the fact that they are not just main-safe, but they can be run from an arbitrary dispatcher (since they encapsulate their own dispatcher) one can even imagine that an IDE can automatically suggest to wrap a chain of such calls into withContext(Default) if it can deduce that the execution will happen on the main thread (e.g. in a function annotated that it runs on a main thread).

ListenableFuture commented 8 months ago

Regarding latest posts from @dkhalanskyjb and @qwwdfsad:

Thanks for the details, it seems it's unlikely that any changes can be made to the current behavior, however it’s still helpful to know this. Our goal is to provide our teams a set of guidelines they can follow, which may vary depending on the team’s requirements.

We believe it’s fine if not all cases are fully optimized, such as long-running operations on background threads. In these cases, performing an extra thread switch should have relatively minor impact. We can also provide automated checks to avoid certain patterns, for example:

mainCoroutineScope.launch {
  withContext(backgroundContext) {
    …
  }
}

The above example can be replaced with mainCoroutineScope.launch(backgroundContext) { … }.

For cases where it’s necessary to have a fully optimized solution, we believe Unconfined is the only way to achieve this, so we’re considering recommending using Unconfined as the root dispatcher:

unconfinedScope = CoroutineScope(Dispatchers.Unconfined + SupervisorJob())

From our understanding, there’s two concerns with inline dispatch - running code on unexpected threads, and running code in unexpected orderings. To mitigate this, we would require that any blocks of code within unconfinedScope are wrapped in withContext. For example:

unconfinedScope.launch {
  withContext(Dipatchers.Unconfined) {
    doSomeWorkOnAnyThread()
  }
  withContext(mainContext) {
    doSomeWorkOnMainThread()
  }
  withContext(backgroundContext) {
    doSomeWorkOnBackgroundThread()
    doSomeOtherWorkOnBackgroundThread()
  }
}

With this approach, developers explicitly specify a dispatcher for each block of code, which can be enforced through an automated check. The goal is to ensure devs actively think about how to dispatch each block of code, which should hopefully avoid the dangers of inline dispatch.

We’d wanted to ask if the details above seem reasonable, or if you may have any further advice. Thanks

Regarding latest post from @elizarov:

We may have several dispatchers which are created from our existing Java executors using ExecutorService.asCoroutineDispatcher(). The dispatchers can each be configured differently, and designed to run specific types of tasks. For example, each dispatcher may define the priority of its threads, whether its thread pool size is fixed or unbounded, and other specific configurations.

From the docs, Dispatchers.Default provides a single thread pool, with thread count between 2 and number of CPU cores. From what I understand, I don’t believe it would be feasible to derive all of our dispatchers from this pool.

elizarov commented 8 months ago

@ListenableFuture Dispatchers.Default is designed to be a better replacement for existing Java executors that are created using ExecutorService.asCoroutineDispatcher().

There is CoroutineDispathcer.limitedParallelism method that provides a better, almost drop-in, replacement for fixed-sized thread pools. Using this method you can control the number of threads you allocate for different kinds of tasks with the benefits that different "pools" actually share threads and avoid thread-switching as much as possible.

If you have other requirements beyond controlling the number of threads, then you can work together with @qwwdfsad and @dkhalanskyjb to design addional APIs for coroutine scheduler to take them into account so that threads can be shared and thread-switching avoided for your other scenarios, too.

I really urge you to have this discussion. An architecture where different thread pools in your app have the same underlying scheduler implementation capable of sharing threads is the right way to go. Moreover, the implementation of Kotlin's CoroutineDispatchers uses state-of-the-art modern scheduling algorithms and outperforms all Java executor implementations (both FJP and FTP) on virtually all benchmarks.

dkhalanskyjb commented 8 months ago

@ListenableFuture

From the docs, Dispatchers.Default provides a single thread pool, with thread count between 2 and number of CPU cores. From what I understand, I don’t believe it would be feasible to derive all of our dispatchers from this pool.

Technically, you can't do that with only Dispatchers.Default, you also need Dispatchers.IO. They share the same thread pool, so switching between Dispatchers.Default and Dispatchers.IO typically doesn't involve a context switch. And Dispatchers.IO.limitedParallelism can be used to obtain an arbitrary number of threads. See the "elasticity" section of the documentation on Dispatchers.IO.

ListenableFuture commented 8 months ago

Thanks for the details, from what I understand this approach would still have an issue with head-of-line blocking. Let’s suppose we have a single-threaded background scope and a disk I/O scope, which share their underlying threads to help avoid extra dispatches:

singleThreadDispatcher = Dispatchers.IO.limitedParallelism(1)
singleThreadScope = CoroutineScope(singleThreadDispatcher + SupervisorJob())

diskIoDispatcher = Dispatchers.IO.limitedParallelism(8)
diskIoScope = CoroutineScope(diskIoDispatcher + SupervisorJob())

Suppose we then have:

singleThreadScope.launch {
  // Single thread - empty block
  withContext(diskIoDispatcher) {
    // Disk I/O thread
    doSomeWorkOnDiskIoThread()
  }
  // Single thread - empty block
}

In the example above, if singleThreadDispatcher is already busy executing other tasks, it seems those tasks would need to complete before the empty blocks in the example above can be processed. Alternatively, if singleThreadDispatcher processed the empty blocks before executing its other tasks, or in parallel, it would break the scope’s limitedParallelism(1) guarantee. So the optimization with sharing threads may help avoid dispatches in certain cases, however it still suffers from head-of-line blocking, which is one of our primary concerns. Note this isn’t limited to single-threaded dispatchers, and would apply to any dispatcher with limited parallelism.

Given the above, and since sharing dispatchers can take a while to migrate to, I also wanted to clarify your thoughts on the proposal in the post above regarding an unconfinedScope. To summarize, each block of code within unconfinedScope would be wrapped in a withContext block to specify its dispatcher, which should prevent code from running on unexpected threads. This should also hopefully mitigate the concerns with running code in unexpected orderings, as long as developers understand its behavior. I was wondering if this might be a reasonable approach, at least for cases where it’s necessary to have an optimized solution.

dkhalanskyjb commented 8 months ago

I also wanted to clarify your thoughts on the proposal in the post above regarding an unconfinedScope.

This won't help you avoid extra dispatches in this case:

unconfinedScope.launch {
  withContext(dispatcher) {
  }
  withContext(dispatcher) {
  }
}

There will be two dispatches to dispatcher. Here's a playground: https://pl.kotl.in/qRkhSvVXQ

ListenableFuture commented 7 months ago

Regarding UnconfinedScope:

Thanks for clarifying, so it seems Dispatchers.Unconfined wouldn’t be optimal in all cases either. In your example the two withContext blocks with the same dispatcher could be merged together, however if they’re hidden away in a library API it no longer works. For example:

unconfinedScope.launch {
  for (i in 1..10) {
    library.suspendBackground()    
  }
}

In the example above, we can wrap the for loop with withContext(backgroundContext) to avoid the extra dispatches, but if we’re unsure what dispatchers the library uses internally, or if the library changes in the future, this wouldn’t work either.

Understanding these details is helpful, however it makes it more difficult to provide concrete guidelines to developers for when an optimal solution is needed.

Having said this, I think the extra dispatches with unconfinedScope is still consistent with Java, where if you transform one future to another on the same executor, it will submit to the executor again instead of continue using the same thread.

In your example, I still wanted to ask if it’s possible to change the behavior to avoid the second dispatch when using unconfinedScope. Are there any risks to this? I’m also wondering if this would need to be done on the Kotlin side, or if it can be done with a custom dispatcher implementation.

Regarding Sharing Thread Pools:

As a separate question, regarding the discussion about sharing thread pools, I wanted to ask if that approach would still be possible if the thread pools customize the priority of their threads, or track thread-local state. For example, a thread pool may use ThreadLocal to identify certain characteristics of the threads in its pool, such as Strict Mode state.

dkhalanskyjb commented 7 months ago

In your example, I still wanted to ask if it’s possible to change the behavior to avoid the second dispatch when using unconfinedScope.

I think it's technically possible. The continuation after withContext could know the last "real" dispatcher it was called on, and the withContext that follows could detect this "real" dispatcher and avoid a dispatch.

Are there any risks to this?

Yes. As Vsevolod explained in https://github.com/Kotlin/kotlinx.coroutines/issues/4040#issuecomment-1965006211, this optimization could break some code.

if it can be done with a custom dispatcher implementation.

Yes. Dispatchers.Main.immediate, for example, detects if it's dispatched from the main thread and doesn't go through a dispatch if so. It's important to note that using Dispatchers.Main.immediate instead of Dispatchers.Main is an opt-in, so the coroutines library isn't arbitrarily skipping dispatches in this case. You can write your own custom dispatchers that do the same thing and detect if the thread is already the one that's needed.

I wanted to ask if that approach would still be possible if the thread pools customize the priority of their threads, or track thread-local state.

I don't understand what issues could arise. Could you elaborate?

For example, a thread pool may use ThreadLocal to identify certain characteristics of the threads in its pool

There's a way to integrate coroutines with ThreadLocal values so that a ThreadLocal is preserved in the coroutine as if the whole coroutine was executed in a single thread: https://kotlinlang.org/api/kotlinx.coroutines/kotlinx-coroutines-core/kotlinx.coroutines/-copyable-thread-context-element/ Is this what you are looking for?