European-XFEL / pasha

Functional-style data processing parallelized in shared memory
BSD 3-Clause "New" or "Revised" License
4 stars 0 forks source link

Allow ahead-of-time pool allocation #3

Open philsmt opened 3 years ago

philsmt commented 3 years ago

Right now, a new pool is allocated whenever map() is called. For use in more realtime scenarios and general optimizing, it would be nice to be able to do this once and use the pool for several passes of calculations. Special care has to be taken that shared memory is allocated before the pool.

takluyver commented 3 years ago

If you're using a process pool, you also need to set up any inputs before allocating the pool, I think - they have to be there when you fork. That could easily be confusing to people who haven't studied the internals.

When I added an option for multithreaded assembly to EXtra-geom (https://github.com/European-XFEL/EXtra-geom/pull/17), I designed it so you could reuse a thread pool rather than creating a new one each time. But my experiments with it suggested that starting & stopping a concurrent.futures.ThreadPoolExecutor is actually pretty cheap, so there's not much pressure to reuse one. (Unfortunately I didn't write down the absolute numbers I saw, just that it was quick compared to assembling images)

So I might suggest leaving this on the back burner until there's a clearer need. I think it's actually a strength for fork-based parallelism that defining your pool (creating a ProcessContext) isn't tied to forking, and forking is delayed until you call map().