JuliaLang / Distributed.jl

Create and control multiple Julia processes remotely for distributed computing. Ships as a Julia stdlib.
https://docs.julialang.org/en/v1/stdlib/Distributed/
MIT License
23 stars 9 forks source link

feature request: @distributed_threads and pmap_threads #67

Open orenbenkiki opened 4 years ago

orenbenkiki commented 4 years ago

In a setup where:

It isn't trivial to create such a setup - one needs to tweak launching worker processes to be multi-threaded. It would be easy if there was a command-line flag for julia that specified the number of threads, requested in JuliaLang/julia#34309. But it is still possible to create such a setup today with a bit of effort, and it is useful as all the threads in each worker process benefit from automatic shared memory "everything", rather than being restricted to constructs such as SharedArray. Of course this means one needs to be careful.

In such a scenario, the current behavior is very clear:

This has the advantage of simplicity and clarity. It also allows using a nested @threads in each iteration of @distributed or pmap to utilize all the threads in all the machines.

However, it would also be useful to have @distributed_threads and pmap_threads.

A @distributed_threads would statically allocate the same number of iterations for each thread across all the machines - that is, will allocate more iterations to worker processes with more threads, and then internally use @threads to execute these on each of the worker process threads. This would be the natural extension of @distributed, which uses static allocation of iterations to processes.

A pmap_threads would dynamically allocate tasks to each thread across all machines. The batch size, if specified, will individually apply to each thread. It might be useful to add a second batch group size (a positive number of batches) such that each worker process would get a whole group of batches at once, and use the threads to execute the smaller batches, to reduce the amount of cross-process coordination required. This would be the natural extension of pmap which uses dynamic allocation of iterations to processes.