Closed vchuravy closed 6 years ago
The reason I left myid
and nprocs
in Base is to help library writers detect if their code is running in distributed mode or not.
if nprocs() > 1
# distributed mode - pmap, `@parallel for` are all available
if myid() == 1
foo_master_in_distmode()
else
foo_worker_in_dist_mode()
end
else
foo_non_dist_mode_do_something_else()
end
We can relatively easily provide a distributed_mode()
function or variable. https://github.com/JuliaLang/julia/blob/fa801bb2c83114228f69cd9db09c5cfafebca999/base/client.jl#L289 is essentially that check.
Alternatively that check correctly reads right something like Base.isbindingresolved(Main, :Distributed) && !(Base.isdeprecated(Main, :Distributed)) && invokelatest(getfield, Main, :Distributed) === Base.root_module(:Distributed)
which is why I abandoned the approach in #25119 and every call to Base.myid needs to do that... (we could cache that lookup, but still...)
distributed_mode()
as implemented will not work for distributed mode started via addprocs
using Distributed
addprocs()
using FooLibrary
FooLibrary.foo() # `foo` needs to know if it us running in distributed mode or not.
nprocs()
is the correct check for it.
You are correct that distributed_mode()
currently disregards dynamic addprocs()
.
Let me make sure that I understand your concern correctly, FooLibrary
needs to know that it is running in a distributed mode, but can't have a import Distributed: nprocs
?
If the library is already aware of distributed programming I would expect that it does at some point do a using Distributed
. Is there are scenario where that wouldn't be true?
@amitmurthy that's what the Base.package_callbacks
are for (so you can set flags for when Distributed
is loaded).
@vtjnash Distributed
is currently "loaded" by default in sysimg.jl (no bindings are put in Main though), so I am not sure if Base.package_callbacks
is applicable here.
@vchuravy this is more future proofing at this point with the assumption that one of the drivers for isolating the distributed functionality in a module and moving into stdlib is to make it easy to support multiple "distributed" implementations. In this scenario, the library will not execute a using Distributed
, but will expect the main driver script to setup a cluster and load an appropriate distributed functionality. The library will call a pmap
if running in dist mode, which may be Distributed.pmap
or FooDistributed.pmap
(with the same function signature) as the case may be. And yes, FooDistributed
will have to replace Base.myid
and Base.nprocs
with its own implementation.
But if the library is calling pmap
or any other functionality from Distributed
or FooDistributed
it will have to import either package since pmap
won't be available otherwise. So wither the library is oblivious of distributed computing and we can use dispatch on types to use a parallel map (like DistributedArrays
) or it has to be aware which Distributed
implementation it is going to be using.
My main goal in moving Distributed
to stdlib was to no longer tie development efforts to Julia versions.
Could you please give some rationale for why nprocs
is being removed from Base? How this affects package writers? In the case it is indeed removed, could you please provide an alternative to the pattern @amitmurthy described? I really on it heavily, for example here: https://github.com/juliohm/GeoStats.jl/blob/master/src/solvers.jl#L30-L37
From a user's perspective, this is how I understood the mechanics of distributed parallelism: http://nbviewer.jupyter.org/github/juliohm/GeoStats.jl/blob/master/examples/ParallelSimulation.ipynb
Is it correct? Will it change?
All of the Distributed
processing tools have been moved to a stdlib package. So the only thing that will change is that at some-point in GeoStats.jl
you will need to say using Distributed
. The only thing this issue is about is whether to also move myid
and nprocs
to Distributed
as well, everything else has already moved.
@amitmurthy We could make myid
and nprocs
fully hookable, but that leads to fairly inefficient code.
myid_func = Ref{Any}(()->1)
myid() = invokelatest(myid_func[])::Int
Then Distributed
can later on say: myid_func[] = Distributed.myid
, but I really would advise against the use of invokelatest
if we can avoid it.
In my books a using Distributed
or import Distributed: myid, nprocs
is not to much to ask of package writers.
Right. I was under the impression that bindings created under Main
are visible to loaded packages - this was the root of my misunderstanding.
No objections to moving myid
and nprocs
out of Base now.
Are there cases where you need to check nprocs
even though you don't use other distributed functionality? It would be bad to require adding a dependency on Distributed
in those cases if so.
If Distributed is a large dependency, I agree with @JeffBezanson , it would be great to be able to check if nprocs() > 1
or nworkers() > 1
as a package writer without a heavy dependency. If it is a light dependency, which will stay light forever, I don't mind it at all.
Why not just have an isdistributed()
predicate in Base similar to isinteractive()
? Checking the number of processes or workers is not really the question we want the answer to.
Maybe isdistributed()
and ismultiprocess()
? As @vchuravy explained to me the other day, my use cases are usually with nworkers()
: https://discourse.julialang.org/t/clarification-on-pmap-master-worker-model/7824
We don't need a predicate for every yes-no question you might ever ask. The reason for isdistributed
is to know if it's safe to do nworkers() > 1
or not.
@StefanKarpinski isdistributed
is effectively nprocs() > 1
.
To answer Jeff's question, I can imaging a scenario where a library may opt to support different types of parallelism.
module Foo
if nprocs() > 1 # dist mode
using DistributedArrays # this in turn loads Distributed
elseif is_gpu_available() # GPU parallelism
using GPUArrays
else
threading=true
end
function foo()
if nprocs() > 1
.... # Use DArrays
elseif is_gpu_available()
.... # Use GPUArrays
else
.... # Use multithreading
end
end
end
After a brief discussion with @vtjnash it seems that the remaining uses of
myid
andnprocs
can be factored out and these two function can move fully toDistributed
.