Stabilize CoroutineDispatcher.limitedParallelism

qwwdfsad commented 12 months ago

In general, we received mostly positive feedback about the API shape, and there is no reason to keep it experimental any further.

Few things to address during stabilization:

[x] Ensure that doc explicitly states that it limits parallelism and not the number of threads (e.g. limitedParallelism(1) does not re-use the same thread all the time)
[x] Ensure that CoroutineContext in dispatch is either transparent or that it can be transparent to the underlying dispatcher to control scheduling tweaks
[x] Document isDispatchNeeded and unconfined-like behaviour for limited parallelism
[x] Mention limitedParallelism(1) properties
[x] Consider https://github.com/Kotlin/kotlinx.coroutines/issues/4023

dovchinnikov commented 11 months ago

Since I've started to advertise limitedParallelism is IJ codebase, I see misunderstanding of Dispatchers.IO elasticity: it's not consistent with other dispatchers, and it raises questions, and I have to direct people to the doc.

I'm afraid that this inconsistency will quickly become a gotcha. I think in the long run it's better to expose the internal unbounded dispatcher since it's already available via Dispatchers.IO.limitedParallelism(Int.MAX_VALUE), and then make Dispatchers.IO.limitedParallelism non-"elastic" for consistency with other dispatchers.

(It also feels like the inconsistency exists only for the sake of API beautification, I hope I'm wrong here, correct me, please)

elizarov commented 11 months ago

Can you be more specific, please, about the gotchas with Dispatchers.IO that you are worried about.

qwwdfsad commented 11 months ago

(It also feels like the inconsistency exists only for the sake of API beautification, I hope I'm wrong here, correct me, please)

One more reason to move towards public design notes, because I'm going to reiterate ours 😅

Initially, Dispatchers.IO was no different from any other dispatchers.

The design of Dispatchers.IO is straightforward -- it has an arbitrary large (64, but the number does not really matter unless it's too large) number of threads to offload the work, it by design cannot be unbounded, it's used as the default sink for any blocking or low-prio tasks in an application. The best part here is that it is a sink, and users don't bother figuring what is the threads limit, when the starvation occurs etc. If one needs something specific, e.g. the dispatcher sized in accordance with DB connection pool, they should use newFixedThreadPool(100) and call it a day. So far so good.

Two downsides though -- a lot of context-switching and, more improtantly, a waste of resources -- each dedicated dispatcher is a bunch of threads, each occuppies 2MB of [potentially not committed] memory.

With limitedParallelism we wanted to address all these problems and are directly advocating for the following pattern:

/ 100 threads for MySQL connection
val myMysqlDbDispatcher = Dispatchers.IO.limitedParallelism(100)
// 60 threads for MongoDB connection
val myMongoDbDispatcher = Dispatchers.IO.limitedParallelism(60)
// 2 threads for logger
val loggers = Dispatchers.IO.limitedParallelism(2)

Threads are reused at their best, nothing is allocated unless really required, and there is no need to shut down executors, all good.

Execpt that Dispatchers.IO size is 64. No issues will be uncovered during unit testing, regular quality assurance and average prod. run. Yet once the application is heavily loaded (e.g. a once-a-season-spike), the application is brought to its knees, with little to no ways to easily figure out what the problem is.

So, we have an API that solves all our problems except one -- it's incident-prone, and such incidents are almost impossible to find in advance. Providing an effectively unbounded pool for limited parallelism (which, on the other side, limits it :)) was the most reasonable solution among others:

qwwdfsad commented 11 months ago

I'm afraid that this inconsistency will quickly become a gotcha.

Could you please provide a few examples where this behaviour is a problem or a gotcha that required a fix?

dovchinnikov commented 11 months ago

Except that Dispatchers.IO size is 64.

This. When I've asked why another newFixedThreadPool exists, it was because "Dispatchers.IO is bounded", and the author didn't want to exhaust Dispatchers.IO, so I've had to explain that they will not exhaust it with limitedParallelism. That's why I find Dispatchers.Unbounded.limitedParallelism(100) more clear than Dispatchers.IO.limitedParallelism(100) for the same use-case.

Providing an effectively unbounded pool ...

That's what I mean by Dispatchers.Unbounded

... for limited parallelism (which, on the other side, limits it :)) was the most reasonable solution among others

And still it was not chosen, why?

dovchinnikov commented 11 months ago

Could you please provide a few examples where this behaviour is a problem or a gotcha that required a fix?

Basically the problem is that Dispatchers.IO.limitedParallelism is left unused, which leads to

a lot of context-switching and, more improtantly, a waste of resources

dkhalanskyjb commented 11 months ago

I think this is a case where being too educated becomes a problem. IDEA writers are not our only users, and most of our users don't even know Dispatchers.IO has a limited amount of threads. For them, this is just the way to run blocking tasks, and 64 threads are a safeguard guarding their piece silently in the background so too many threads don't get spawned accidentally. Dispatchers.IO.limitedParallelism was made elastic so that "the obvious thing" of creating a view of a dispatcher dedicated to blocking tasks is also the correct one. If people managed to learn coroutines deep enough to know that there's a 64-thread limit, I think they can also learn these new details about its behavior.

dovchinnikov commented 11 months ago

Another thing related to the intent of code.

Some client, which the platform does not control, may regularly exhaust Dispatchers.Default. In few cases we cannot afford starvation because of some plugin. The solution is to have a separate dispatcher, which is private to the platform, and cannot be used by anybody else. The dispatcher looks like Dispatchers.IO.limitedParallelism(Runtime.getRuntime().availableProcessors()). Under load there will be 2xCPU threads doing CPU work, but at least the system will progress. The dispatcher declaration raises a question: if it's for IO-bound tasks, then why it has a size of CPUs, or if it's for CPU-bound tasks, then why it's a view of Dispatchers.IO?

dkhalanskyjb commented 11 months ago

The case you're describing is an interesting one, thanks!

The issue is, that we're a widely used general-purpose library, so each public definition has a cost. Introducing a whole new Dispatchers.Unbounded would add much confusion, and the case you're describing—intentionally wanting to conflict with Dispatchers.Default for CPU time—is something the overwhelming majority of our users will never even come near to encountering. If the dispatcher declaration looks unclear, there's always the option to write something like val Dispatchers.Unbounded = Dispatchers.IO.limitedParallelism(Int.MAX_VALUE) in your own code. Do you think causing half of Android developers out there a headache just so a workaround in IDEA's code looks more idiomatic is a worthwhile tradeoff?

dovchinnikov commented 11 months ago

Introducing a whole new Dispatchers.Unbounded would add much confusion

I'd expect that it would at least make the user think about why both Dispatchers.IO and Dispatchers.Unbounded exist, and then they will discover that Dispatchers.IO is indeed a bounded view over Dispatchers.Unbounded. I mean that it's not confusion but education in a natural way.

Consider this:

Dispatchers.Unbounded is the dispatcher.
Dispatchers.Default is a view of Unbounded with parallelism=CPU.
Dispatchers.IO is a view of Unbounded with parallelism=64.

Why Dispatchers.Default and Dispatchers.IO even exist in that picture when one can define their own? To convey the intent of the API user: Default for CPU-bound tasks, IO for IO-bound tasks.

Now, limitedParallelism work the same way with any of those, which reduces the cognitive load by avoiding another inconsistency and avoiding to having to remember that such thing as "elasticity" exists.

(Dispatchers.Default still needs to exist in the lib because it's used when no dispatcher is specified. I'm omitting that because I'd like to be able to override the default coroutine context with SPI, which is off topic, but it should make the effective default dispatcher configurable and possible not equal to Dispatchers.Default.)

dkhalanskyjb commented 11 months ago

I understand your desire for a simpler conceptual model as someone who routinely looks into how coroutines work internally. You are right, exposing Dispatchers.Unbounded would be educational if we wanted people to learn about these implementation details, and yes, aside from performance optimizations related to work-stealing strategies that are aware of whether work is CPU-bound or I/O-bound, Dispatchers.Default could be a view of Unbounded.

The problem is, that these details are almost always irrelevant, and by exposing them, we would only make it more difficult to understand what's important and what isn't, muddying the conceptual field in which people are actually operating.

— I can only call my API endpoint one request at a time. How do I do that? — You need val apiEndpointDispatcher = Dispatchers.IO.limitedParallelism(1). Because you're connecting over the network, you need Dispatchers.IO, and limitedParallelism(1) here means that not more than one thread at a time may work there. — Ok, thanks. I also need a thread to connect to SQLite. Should I add another val sqliteDispatcher = Dispatchers.IO.limitedParallelism(1)? — Yep. — Gotcha. Oh, hey, we're going to move from SQLite to MySQL soon, which allows up to 100 simultaneous connections. Do I just replace 1 with 100? — No-no-no-no! Dispatchers.IO doesn't have 100 threads in it, you'll be shooting yourself in the foot. Also, even your dispatcher for the API endpoint may stop working if you do that. You must keep the sum of views on Dispatchers.IO well under 64 at all times. I'd say keep it under 60, so that at least 4 threads are available for work launched on Dispatchers.IO directly. — Oh. — You need Dispatchers.Unbounded.limitedParallelism(100). — What's Dispatchers.Unbounded? — It's like Dispatchers.IO but without the 64-thread limit. — Ok, why would I ever use Dispatchers.IO then if it's so error-prone? — Usually, you do need this limit. Be careful not to use Dispatchers.Unbounded directly in cases when I/O work can come in quick succession: if you get a thousand requests for file reading and push them all to Dispatchers.Unbounded, it may create a thousand threads. — Is that bad? — Well, your program may become unresponsive. — Aren't threads supposed to help utilize a computer to its fullest? — Yes, but if there are too many threads, there's a cost to switching between them. In fact, just to be safe, don't use Dispatchers.Unbounded directly at all, only its views. This way, you'll have a bounded number of threads at all times, even though Unbounded is used. — What a mess. So, if it's error-prone to take views of Dispatchers.IO and it's also error-prone to not take views of Dispatchers.Unbounded, why didn't the coroutines guys just make it so that it's safe to take views of Dispatchers.IO and get rid of Dispatchers.Unbounded altogether? — Don't be silly. This wouldn't be conceptually clean. After all, Dispatchers.IO does only have 64 threads (or however many you configure), and if you're taking its view, clearly you shouldn't expect more threads to appear out of thin air.

Now imagine you're the guy asking the questions, but you don't have anyone to give you answers. All you wanted to do was create a connection pool that uses threads marked for I/O-bound work (because that's the level you operate on), but now you have to dig into liveliness, understand thread pools, scheduling, and so on. Coroutines are supposed to make multithreading easily available, not just dump another layer of complexity on top of what's already difficult about it.

The fact that most people don't know about the 64-thread limit is a testament to it being unobtrusive in practice, not a thing to "fix."

dovchinnikov commented 11 months ago

So, if it's error-prone to take views of Dispatchers.IO and it's also error-prone to not take views of Dispatchers.Unbounded, why didn't the coroutines guys just make it so that it's safe to take views of Dispatchers.IO and get rid of Dispatchers.Unbounded altogether?

So

Dispatchers.Unbounded.limitedParallelism is needed, but Dispatchers.Unbounded is, presumably, error-prone.
Dispatchers.IO is needed, but Dispatchers.IO.limitedParallelism is, presumably, error-prone.
Let's give Dispatchers.IO.limitedParallelism the semantics of Dispatchers.Unbounded.limitedParallelism, this way we solve two problems at once.

I totally understand and support that exposing Dispatchers.Unbounded might be dangerous.

If you don't want to expose Dispatchers.Unbounded, then how about this:

// returns a new dispatcher view over global scheduler pool
fun Dispatchers.newDispatcher(parallelism: Int): CoroutineDispatcher = TODO()

— I can only call my API endpoint one request at a time. How do I do that? — You need val apiEndpointDispatcher = Dispatchers.newDispatcher(1). Because you're connecting over the network, you need a new dispatcher 1 here means that not more than one thread at a time may work there. — Ok, thanks. I also need a thread to connect to SQLite. Should I add another val sqliteDispatcher = Dispatchers.newDispatcher(1)? — Yep. — Gotcha. Oh, hey, we're going to move from SQLite to MySQL soon, which allows up to 100 simultaneous connections. Do I just replace 1 with 100? — Yep! — So, it's error-prone to take views of Dispatchers.IO? — Don't be silly, Dispatchers.IO does only have 64 threads (or however many you configure) like Dispatchers.Default has CPU-count threads (or however many you configure), and if you're taking its view, clearly you shouldn't expect more threads to appear out of thin air just like with any other limitedParallelism dispatcher.

This approach aligns with newFixedThreadPool in a way, i.e. Executors.newFixedThreadPool(x) -> Dispatchers.newDispatcher(x).

dkhalanskyjb commented 11 months ago

Even if we agreed, the ship has sailed: https://duckduckgo.com/?q="dispatchers.io.limitedparallelism"+!g&t=ffab&ia=web There's plenty of documentation that already points to Dispatchers.IO.limitedParallelism as the way to go. So it is already a huge breaking change for us to keep limitedParallelism but remove the elasticity property.

Also, I feel that we're speaking past each other. As I see it, Dispatchers.IO being 64-threaded is an implementation detail. A response to the problem of bursts of tasks happening. It's not an important property one should focus on when writing code, except when it's deeply technical and infrastructural. The inherent limitation to the number of I/O tasks that should be allowed to execute in parallel is much higher than 64, it's just an arbitrary safe number. Maybe one day, we'll discuss removing this limit by default and replacing it with some throttling on creating new threads. I don't see anything that's conceptually stopping us from doing it, only technical considerations. Your point as I see it: "IO is limited to 64 threads, so its views should be a subset of that, or the API becomes muddied." If we took the 64-thread limit as the inherent property of IO, that would be true. My point: "The limit already makes the API dirty, and if people naively used views of IO that respected this limit, the leaky abstraction behind IO would be even more noticeable, and this shouldn't happen." People shouldn't have to know that there's a limit in order to write correct code that uses Dispatchers.IO, especially when incorrect code would look and work correctly in most but the most critical circumstances.

To reiterate my initial point: if you're deep enough to know about the 64-thread limit, you're deep enough to learn about elasticity easily. If you're tired of explaining elasticity to colleagues, you can introduce Dispatchers.Unbounded in your code and replace Disptachers.IO.limitedParallelism with that.

If you have some considerations other than API purity, like if you can imagine error-prone code resulting from people not knowing about elasticity, please share them. Could someone want to hit the 64-thread limit and be surprised when it doesn't happen in the presence of views? Why?

pacher commented 9 months ago

How about exposing Unbounded but don't let it implement Dispatcher. Call it DispatcherProducer or CommonCoroutinesThreadPool or whatever. The only method available on this new entity is limitedParallelism which creates a Dispatcher. This way as a user when I ctrl click through Dispatchers I will see

Dispatchers.Default = CommonPool.limitedParallelism(numberOfCPU)
Dispatchers.IO = CommonPool.limitedParallelism(64)
etc.

which is pretty clear and makes the relationship between dispatchers (avoiding context-switch) obvious. There would be also no question on how to create my own MySql dispatcher backed by the same pool.

Then we can still make IO special in a sense that when limitedParallelism called on it the resulting dispatcher is a fresh dispatcher from underlying pool and not the view of IO so that all this documentation out there still valid and there is no breaking change.

qwwdfsad commented 4 months ago

On a side note, there seems to be a misconception about limitedParallelism(1) being "racy", for example -- https://github.com/KStateMachine/kstatemachine/blob/master/docs/index.md#use-single-threaded-coroutinescope

While this might not be a goal of our documentation per se, it still would be nice to explicitly bust this myth

Kotlin / kotlinx.coroutines

Stabilize CoroutineDispatcher.limitedParallelism #3864