Closed alamb closed 3 days ago
The explanation for the problem is super clear
One thing I don't understand is: For Tokio's convention, all tasks created by spawn
are expected to be IO-bounded, and those CPU-bound task should be created using spawn_blocking()
. I think the implementation also uses two separate thread pool, which is very similar to the approach in this doc.
Why can't we all use spawn_blocking()
for all CPU-bounded task, and instead we have to use two runtimes explicitly 🤔
If we follow this two runtime approach, would existing spawn_blocking()
in the code (e.g. https://github.com/apache/datafusion/blob/6d8313ebc865f9bff007bfc04652f58b016cbc1b/datafusion/physical-plan/src/spill.rs#L53) cause unexpected behavior (3 thread pools co-exist)
I took a quick look at this and content looks good, couldn't check diagrams as on phone.
Why can't we all use spawn_blocking() for all CPU-bounded task, and instead we have to use two runtimes explicitly
We could, however, to preserve the thread per core architecture we would need to cap the threads of the blocking pool to the core count. Then as blocking tasks can't yield waiting for input, every CPU bound morsel would have to be spawned separately. This would give us a "morsel-driven" scheduler, however, tokio has a relatively high per task overhead and so even discounting the sheer amount of boilerplate this would require, the performance would be regrettable. If going down this path you might as well switch to an actual morsel driven scheduler (although this is at this stage likely intractable).
Ultimately spawn_blocking is designed for blocking IO, it is not designed for CPU bound tasks.
Why can't we all use spawn_blocking() for all CPU-bounded task, and instead we have to use two runtimes explicitly 🤔
Thank you @2010YOUY01 for the question and @tustvold for the answer -- this question also has come up in the past so I added a summary in ad161794d76731d4472bec3c77c0e751affd1f46
Thank you everyone for the comments -- I am going to merge this one in and make some adjustments as a follow on PR to avoid keeping it open for too long. Long live comments!
Which issue does this PR close?
Rationale for this change
In order to understand the need for a second runtime, I needed to write up the background material so it made sense.
Also, this is starting to come up with dft:
What changes are included in this PR?
This PR adds background documentation on how DataFusion runs plans and why a separate Runtime may be needed to keep the network busy
Note I am also working on an example to show how to actually use a second runtime which I will link to these docs when it is ready
Also, I found myself on a ✈️ without WIFI so I also made a bunch of ASCII art while I was at it)
Are these changes tested?
By CI
Are there any user-facing changes?
Nice documentation hopefully!