dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
10.23k stars 1.58k forks source link

performance isolates #51603

Open gmb119943 opened 1 year ago

gmb119943 commented 1 year ago

From a code performance point of view, is it better to use isolate pools to send unrelated tasks to isolates for execution? Or is it possible to create a new isolate for each task without loss of code performance?

a-siva commented 1 year ago

This is going to depend on a number of factors

gmb119943 commented 1 year ago

The scenario is roughly the following. There is a set of unrelated tasks. For each task, a new isolate is created and uses the exit function to pass the result without copying. Would it be better to keep a ready pool of isolates in this case, or is the cost of creating an isolate always minimal (if the number of isolates is less than the maximum limit, more than 16 isolates cannot be created on my PC)?

lrhn commented 1 year ago

As @a-siva says, "that depends".

Which operation dominates the computation? And is it speed or memory which is more important?

If you use Isolate.run, you spend time creating a new isolate and sending the initial message. Then you do the computation. Then you copy the result back for free. And the isolate goes away when it's done and takes no more memory.

If you use an isolate pool, you spend no time creating an isolate, send the initial message, do the computation, then spend time copying back the result. And the isolate stays alive, taking up member, whether you use it again or not.

The sending of the initial message and doing the computation are fixed costs.

For small return values, not creating a new isolate is definitely faster. For large return values, creating a new isolate, but getting free return shipping, is definitely faster.

To find the cut-off point, you will have to measure your program. The start-up time of an isolate will most likely depend, at least a little, on the size of the program it's being spawned from. Even with fast isolate spawning and sharing of immutable data, there will be some setup to make space for global mutable variables, which exist per-isolate.

The one further risk of an isolate pool is that you may get less parallelization. If you have 10 isolates in the pool, and you run 20 tasks, it will at most run 10 of those at a time. With 20 isolates, it can hypothetically run twice as fast. If the user has 20 CPU cores to run on, they're not doing anything else, and the stars are just right. (And if you can spawn 20 isolates. If there is a limit on how many isolates one can create, then a pool can help avoiding that, but going too close to the limit might break other libraries which try to create their own isolates.)

But you can also use a growing isolate pool which creates new isolates so every concurrent request has its own isolate, then it reuses those isolates only when the computation is done.

Then there is the memory cost of keeping isolates alive when they aren't needed any more. (And that's when one starts considering garbage-collecting isolates if usage drops for a while, or keep a hard maximum number of isolates, and all the other considerations you'd have for resource pools in general.)

There will be some "pool maintenance" cost, but that's likely to be negligible compared to the actual computations.

And there needs to be a pool strategy, which is at least:

All these decisions factor into how efficient the pool will be. So try, and measure. There is no one answer which fits all programs.

(One example of a load-balancing pool is, from the no-longer-maintained package:isolate, LoadBalancer. Whether it fits your goals depend on what those goals are.)