JuliaLang / Distributed.jl

Create and control multiple Julia processes remotely for distributed computing. Ships as a Julia stdlib.
https://docs.julialang.org/en/v1/stdlib/Distributed/
MIT License
23 stars 9 forks source link

Nested PMAP calls do not work #62

Open abx78 opened 5 years ago

abx78 commented 5 years ago

Hello, the following code with has nested pmap functions, it hangs even if I execute it with 1 single process addprocs(1).

using Distributed
@everywhere function inner_process(task_id)
    task_id^2
end

@everywhere function outer_process(job_id)
    inner_task  = collect(1:2)
    pmap(inner_process, inner_task)
end

function do_job(jobs_list) 
    pmap(outer_process, jobs_list)
end

jobs_list = collect(1:10)
do_job(jobs_list)

This is the version I am using

julia> versioninfo() Julia Version 1.1.0 Commit 80516ca202 (2019-01-21 21:24 UTC) Platform Info: OS: Windows (x86_64-w64-mingw32) CPU: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-6.0.1 (ORCJIT, skylake) Environment: export JULIA_NUM_THREADS = 4 JULIA_DEPOT_PATH = C:\Program Files\ReSolver.DistributedJulia\depot JULIA_EDITOR = C:\Users\AppData\Local\atom\atom.exe JULIA_NUM_THREADS = 4 JULIA_PKGDIR = C:\Program Files\ReSolver.DistributedJulia\packages

Am I maybe misunderstanding the feature or is this an expected behavior?

jpsamaroo commented 5 years ago

Note that this code executes correctly if no worker processes are added, i.e. one does not run addprocs and just uses the master process.

dkarrasch commented 5 years ago

I guess this is expected behavior, since the outer pmap already employs (by default) all workers, so there are no ones left to distribute to. Check the ?pmap docstring. However, when you specify which workers to use, everything works fine even with nested pmaps:

using Distributed
addprocs() # this yields workers 2:5 on my machine
@everywhere function inner_process(task_id)
    task_id^2
end

@everywhere function outer_process(job_id)
    inner_task  = collect(1:2)
    pmap(inner_process, WorkerPool([4, 5]), inner_task)
end

function do_job(jobs_list) 
    pmap(outer_process, WorkerPool([2, 3]), jobs_list)
end

jobs_list = collect(1:10)
do_job(jobs_list)

returns the same result as running your original code on a single worker process.

alandion commented 5 years ago

This example also works for me if using a workerpool containing all processes, including the master process. The default worker pool for pmap doesn't contain process 1 it seems.

using Distributed
addprocs() # this yields workers 2:5 on my machine
@everywhere function inner_process(task_id)
    task_id^2
end

@everywhere function outer_process(job_id)
    inner_task  = collect(1:2)
    pmap(inner_process, WorkerPool(procs()), inner_task)
end

function do_job(jobs_list) 
    pmap(outer_process, WorkerPool(procs()), jobs_list)
end

jobs_list = collect(1:10)
do_job(jobs_list)
OkonSamuel commented 4 years ago

@alandion. It also works if a worker pool containing all workers excluding the master process is created. It seems the trick is to create a worker pool when nesting??. (unlike the example given by @dkarrasch example here am sharing all workers)

using Distributed
addprocs() # this yields workers 2:5 on my machine
@everywhere function inner_process(task_id)
    task_id^2
end

@everywhere function outer_process(job_id)
    inner_task  = collect(1:2)
    pmap(inner_process, WorkerPool(workers()), inner_task)
end

function do_job(jobs_list) 
    pmap(outer_process, WorkerPool(workers()), jobs_list)
end

jobs_list = collect(1:10)
do_job(jobs_list)

For me it only hangs when an explicit WorkerPool or CachingPool isn't passed as an argument.