JuliaLang / Distributed.jl

Create and control multiple Julia processes remotely for distributed computing. Ships as a Julia stdlib.
https://docs.julialang.org/en/v1/stdlib/Distributed/
MIT License
29 stars 11 forks source link

Documentation of Distributed.remotecall() misleading #81

Open torrance opened 2 years ago

torrance commented 2 years ago

Specifically considering the AbstractWorkerPool implementation of this method, remotecall(f, pool::AbstractWorkerPool, args...; kwargs...), the documentation states:

WorkerPool variant of remotecall(f, pid, ....). Wait for and take a free worker from pool and perform a remotecall on it.

The impression from the documentation is that this function will only submit jobs to idle workers (which in my use case was desirable behaviour).

However, the 'wait for' a free worker doesn't correctly convey what happens. This method will wait for a worker in the worker pool, but this worker may not be 'free' in the sense that it is idle. This occurs since the inner remotecall() returns immediately, and so this function takes and immediately returns workers back to the pool.

The documentation should state that this function:

  1. will assign work to worker pool cyclically,
  2. will return immediately,
  3. omit any mention of 'waiting for free workers'.
GregPlowman commented 2 years ago

Agree, the documentation could and should be improved to avoid misunderstanding.

However, technically it is correct to say "Wait for and take a free worker from pool ...".

If the workers are assigned jobs only from calls to remotecall, which returns immediately, then it will appear that each worker is never busy. However, a worker might be busy from some other call.

Below, the remotecall_wait keeps the first worker busy, so the remotecalls within the for loop are executed on the only free worker.

using Distributed
addprocs(2)
wp = default_worker_pool()

@everywhere function f(i, n)
    println("Starting... $(i)")
    sleep(n)
    println("Done $(i).")
end

@async remotecall_wait(f, wp, 1, 10)    # keeps first worker busy

for i in 2:5
    remotecall(f, wp, i, 5) # runs asynchronously on free worker only, returns immediately
end